Skip to main content

Transactions & Isolation Levels

TL;DR

Transactions group operations into all-or-nothing units with ACID guarantees. Isolation levels control how concurrent transactions see each other's changes, preventing anomalies (dirty reads, non-repeatable reads, phantom reads, write skew). Choose the weakest isolation that preserves your invariants. Read Committed is the sweet spot for most OLTP; test under realistic load to verify correctness.

Learning Objectives

After reading this article, you will be able to:

  • Explain ACID properties and isolation anomalies in plain English.
  • Choose an isolation level that prevents unacceptable anomalies while maximizing performance.
  • Implement locking and MVCC strategies for efficient isolation.
  • Test transactions under contention to verify invariants hold.
  • Recognize when optimistic concurrency is better than pessimistic locking.

Motivating Scenario

Your payment system transfers money between accounts. Concurrent transfers risk overselling: if two transfers happen on the same account simultaneously, one might not see the other's debit, resulting in a negative balance. With Read Uncommitted, this invariant breaks silently. With Serializable, every transfer waits for locks—high latency. With Read Committed + explicit locks on hot rows, transfers are fast and safe. The key: understand your invariants, choose the right isolation, and test under load.

Core Concepts

ACID Properties

PropertyDefinitionImpact
AtomicityAll-or-nothing: if a transaction fails, all changes roll backPrevents partial updates
ConsistencyDatabase moves from one valid state to anotherApplication ensures invariants (DB enforces constraints)
IsolationConcurrent transactions don't interfere with each otherControlled by isolation level
DurabilityCommitted data survives crashesWrite-ahead logging (WAL)

Anomalies (Problems Isolation Prevents)

AnomalyDefinitionExample
Dirty ReadRead uncommitted (rolled back) data from another transactionTxn A writes X, Txn B reads X before A commits, A rolls back
Non-repeatable ReadA read returns different values in same transactionTxn reads X=1, another txn changes X=2, first txn reads X=2
Phantom ReadA range query returns different rows on repeated callsTxn queries users with balance < 100, another inserts such a user, repeat query sees it
Write SkewTwo txns update different rows but violate an invariantTwo on-call doctors both book off the same shift without seeing each other

Isolation Levels

LevelDirty ReadNon-RepeatablePhantomWrite SkewCost
Read UncommittedYesYesYesYesLow
Read CommittedNoYesYesYesMedium
Repeatable ReadNoNoYesYesMedium-High
SerializableNoNoNoNoHigh
Snapshot IsolationNoNoNoYesMedium-High
Decision flow: pick the weakest isolation that prevents unacceptable anomalies.

Implementation: Locking vs. MVCC

Pessimistic Locking:

  • Acquire locks before reading/writing.
  • Lock types: shared (read), exclusive (write).
  • Deadlock risk: two txns waiting for each other's locks.
  • High latency if contention is high.

Optimistic Concurrency (MVCC):

  • Each txn gets a snapshot of data at transaction start.
  • Readers never block writers (and vice versa).
  • On commit, check for conflicts; if found, abort and retry.
  • Better latency with low contention; higher abort rate under high contention.

PostgreSQL uses MVCC; MySQL InnoDB uses both (row-level locking + MVCC).

Practical Example: Testing Isolation Levels

-- Test table: accounts with balance
CREATE TABLE accounts (
id INT PRIMARY KEY,
name VARCHAR(100),
balance DECIMAL(10, 2),
version INT DEFAULT 0 -- For optimistic locking
);

INSERT INTO accounts (id, name, balance) VALUES (1, 'Alice', 1000);
INSERT INTO accounts (id, name, balance) VALUES (2, 'Bob', 1000);

-- Test 1: Dirty Read with Read Uncommitted (rare in practice)
-- Session A:
BEGIN TRANSACTION ISOLATION LEVEL READ UNCOMMITTED;
SELECT balance FROM accounts WHERE id = 1; -- Should see Alice's balance
-- Session B (concurrent):
BEGIN TRANSACTION ISOLATION LEVEL READ UNCOMMITTED;
UPDATE accounts SET balance = 500 WHERE id = 1;
ROLLBACK; -- Simulate error
-- Session A:
SELECT balance FROM accounts WHERE id = 1; -- May have seen 500 (dirty read)
COMMIT;

-- Test 2: Non-repeatable Read with Read Committed
-- Session A:
BEGIN TRANSACTION ISOLATION LEVEL READ COMMITTED;
SELECT balance FROM accounts WHERE id = 1; -- Sees 1000
-- Session B (concurrent):
BEGIN TRANSACTION ISOLATION LEVEL READ COMMITTED;
UPDATE accounts SET balance = 800 WHERE id = 1;
COMMIT;
-- Session A:
SELECT balance FROM accounts WHERE id = 1; -- Sees 800 (non-repeatable read)
COMMIT;

-- Test 3: Phantom Read with Repeatable Read
-- Session A:
BEGIN TRANSACTION ISOLATION LEVEL REPEATABLE READ;
SELECT COUNT(*) FROM accounts WHERE balance < 500; -- Sees 0
-- Session B (concurrent):
BEGIN TRANSACTION ISOLATION LEVEL REPEATABLE READ;
INSERT INTO accounts (id, name, balance) VALUES (3, 'Charlie', 100);
COMMIT;
-- Session A:
SELECT COUNT(*) FROM accounts WHERE balance < 500; -- Sees 1 (phantom read)
COMMIT;

-- Test 4: Write Skew (Serializable prevents this)
-- Invariant: At least one on-call doctor per shift
-- Session A:
BEGIN TRANSACTION ISOLATION LEVEL SERIALIZABLE;
SELECT COUNT(*) FROM doctors WHERE shift = 'A' AND on_call = true; -- Sees 1
UPDATE doctors SET on_call = false WHERE id = 1 AND shift = 'A';
COMMIT;
-- Session B (concurrent, started before A):
BEGIN TRANSACTION ISOLATION LEVEL SERIALIZABLE;
SELECT COUNT(*) FROM doctors WHERE shift = 'A' AND on_call = true; -- Sees 1
UPDATE doctors SET on_call = false WHERE id = 2 AND shift = 'A';
-- Attempt to COMMIT → blocked or aborted (Serializable prevents write skew)
COMMIT;

-- Test 5: Optimistic Locking (application-level)
-- SELECT version before update
BEGIN TRANSACTION;
SELECT balance, version FROM accounts WHERE id = 1; -- balance=1000, version=0
UPDATE accounts SET balance = 950, version = version + 1 WHERE id = 1 AND version = 0;
-- If concurrent update happened, version != 0 and UPDATE affects 0 rows → abort
COMMIT;

When to Use / When NOT to Use

Stronger Isolation (Serializable)
  1. Critical financial transactions (fund transfers, account updates)
  2. Inventory updates with low concurrency (can afford lock wait time)
  3. Unique constraint enforcement (email address, username) with high write rate
  4. Invariants that cannot be expressed as row-level constraints
  5. Low-concurrency systems where lock contention is rare
Weaker Isolation (Read Committed)
  1. High-concurrency OLTP systems where lock contention is costly
  2. Analytic queries that can tolerate stale reads
  3. Invariants that can be enforced at application level (idempotent ops)
  4. Systems using MVCC where write skew is acceptable
  5. Read-heavy workloads (locking has minimal benefit)

Patterns & Pitfalls

Optimistic locking (version numbers) detects conflicting writes at commit time. Pessimistic locking (exclusive locks) prevents conflicts upfront. Hybrid: lock hot rows pessimistically, use optimistic for others.
When two txns wait for each other's locks, a deadlock occurs. Most databases detect this and abort one txn. Retry with exponential backoff and random jitter to avoid thundering herd.
MVCC (multi-version concurrency control) lets readers see a consistent snapshot without blocking writers. Write conflicts are detected on commit. Great for high-concurrency reads with occasional conflicts.
Long txns hold locks and bloat MVCC storage. Keep transactions short: batch operations, minimize round-trips. Use connection pools to prevent txn hoarding.
Txns that modify too many rows risk lock contention and timeouts. Break large updates into batches, commit frequently, and monitor lock wait times.
Test critical txns with realistic concurrency: many clients, concurrent updates to same rows, network delays. Jepsen or similar tools help find isolation bugs under failure scenarios.

Design Review Checklist

  • Have you documented all invariants that transactions must preserve?
  • Have you identified which anomalies (dirty read, phantom, write skew) would break invariants?
  • Have you chosen an isolation level and justified the choice?
  • Are critical hot rows locked pessimistically to avoid contention?
  • Are long-running transactions broken into smaller, faster txns?
  • Have you tested under realistic concurrency (load test)?
  • Is deadlock detection enabled and retries implemented?
  • Are transaction logs and monitoring configured to detect anomalies?
  • Have you tested recovery after a crash (durability)?
  • Is the isolation level clearly documented in code and runbooks?

Self-Check

Before choosing an isolation level:

  1. Invariants: What invariants must hold? Can weak isolation preserve them with application-level checks?

  2. Contention: How many concurrent transactions update the same rows? High contention favors weaker isolation (less lock wait).

  3. Latency SLO: Can txns tolerate 50-500ms lock waits? If not, use Read Committed with optimistic locking.

  4. Write rate: High write rate with weak isolation risks conflicts and retries. Test to find the sweet spot.

Next Steps

  • Document invariants: List all critical business rules that transactions must preserve.
  • Profile access patterns: Identify hot rows and high-concurrency areas.
  • Choose isolation level: Start with Read Committed; upgrade to Serializable if anomalies occur.
  • Implement retries: Handle transaction aborts (deadlocks, conflicts) with exponential backoff.
  • Load test: Verify behavior under realistic concurrency before deploying.
  • Monitor: Track lock waits, transaction aborts, and anomaly detection metrics.

References

  1. PostgreSQL: Transaction Isolation Levels ↗️
  2. MySQL InnoDB: Isolation Levels ↗️
  3. Jepsen: Database Consistency Analyses ↗️
  4. Serializable Isolation for Snapshot Databases ↗️