Skip to main content

MVCC: How Snapshots Avoid Reader-Writer Contention

What This Concept Is

Multi-Version Concurrency Control (MVCC) replaces the locking rule "readers block writers / writers block readers" with a different rule: when a row is modified, the engine does not overwrite it -- it writes a new version and leaves the old version in place until no live transaction still needs it.

Each transaction gets a snapshot: a logical timestamp T_snap picked at start time (or at first statement, depending on isolation level). When the transaction reads a row, it walks the chain of versions for that row and returns the latest one whose creation timestamp is <= T_snap and whose deletion timestamp (if any) is > T_snap.

Key implementation structures:

  • Row versions: each row has xmin (creator txid) and xmax (deleter/updater txid). An UPDATE writes a new tuple with a new xmin and stamps the old tuple's xmax.
  • Snapshot: a tuple of (xmin_horizon, xmax, list_of_active_xids) -- everything needed to answer "was this version visible to me?"
  • Visibility rules: a tuple is visible to snapshot S iff xmin < S.xmax and xmin not in S.active and (xmax is NULL or xmax >= S.xmax or xmax in S.active).
  • Vacuum / garbage collection: once no live snapshot can see version v, v is eligible for reclamation. Postgres calls this VACUUM; Oracle uses rollback segments; MySQL InnoDB uses the undo log.

Snapshot isolation (SI) is the isolation level MVCC most naturally produces: a transaction reads a consistent snapshot and commits if its writes do not conflict with another committed writer in the overlap window.

Why It Matters Here

MVCC is the default concurrency model of PostgreSQL, Oracle, SQL Server (RCSI), MySQL InnoDB, and essentially every modern NewSQL engine. It has two effects that transformed the industry:

  • Readers do not block writers, writers do not block readers. This is the headline property and the main reason MVCC is preferred for OLTP under mixed workloads.
  • Long-running read queries do not hold locks. An analytical scan that takes 30 min does not block any writer, because it reads an old snapshot.

The cost is bookkeeping: version chains grow, garbage collection must keep up, and certain write-write conflicts that 2PL would block, MVCC instead aborts at commit time (first-committer-wins or first-updater-wins).

Concrete Example -- Two Transactions, No Blocking

Initial row: (id=1, balance=100, xmin=50, xmax=NULL).

T1 starts at xid=100, snapshot sees xmin < 100
T2 starts at xid=101, snapshot sees xmin < 101

T1: UPDATE accounts SET balance = 120 WHERE id = 1
- engine writes new row: (id=1, balance=120, xmin=100, xmax=NULL)
- engine sets old row's xmax = 100

T2: SELECT balance FROM accounts WHERE id = 1
- walks the chain:
v2 = (balance=120, xmin=100, xmax=NULL) -- xmin=100 not < T2.snap.xmax=101? depends on commit status. Uncommitted from T2's view.
v1 = (balance=100, xmin=50, xmax=100) -- xmax=100 not committed yet => version still visible to T2
- returns 100

T2 read without waiting. T1 wrote without waiting. That is the MVCC win. If T1 now commits, further new transactions see 120; T2 continues seeing 100 until it ends.

Concrete Example -- Write-Write Conflict

Both T1 and T2 try to update the same row under snapshot isolation.

T1 starts, reads row (version v1)
T2 starts, reads row (version v1)
T1: UPDATE row SET x = x+1 -- writes v2 with xmin=T1
T1 COMMIT
T2: UPDATE row SET x = x+1 -- tries to update v1, but a newer committed version v2 exists
-> MVCC aborts T2 with a serialization failure ("first committer wins")

Under 2PL, T2's read would have acquired a shared lock, blocking T1's update; T1 would have blocked behind T2. Under MVCC, neither blocked at read time, but T2 must retry. This is the classic trade: less blocking, more retry.

Common Confusion / Misconception

"MVCC means no locks at all." Writes still acquire row-level locks for the duration of their own transaction (so two writers cannot simultaneously create overlapping versions). DDL still takes heavyweight locks. MVCC only eliminates reader-writer contention.

"Snapshot isolation is serializable." It is not. Classic SI admits write skew: two transactions read overlapping data, update disjoint rows based on what they read, and the combined outcome is not equivalent to any serial order. PostgreSQL solves this with SERIALIZABLE isolation (SSI -- Serializable Snapshot Isolation), which adds conflict tracking; simple SI does not.

"Vacuum is optional." It is not; it reclaims dead tuples. Skipped or slow VACUUM on a write-heavy Postgres table causes bloat -- the table grows even though row count does not -- and eventually transaction ID wraparound, a legitimate outage condition.

How To Use It

When reasoning about a transactional workload:

  1. Ask which isolation level each session uses (READ COMMITTED reads the latest snapshot per statement; REPEATABLE READ/SI holds one snapshot per transaction).
  2. Expect writers to serialize at row level; expect readers to proceed.
  3. Budget for version-chain growth and verify vacuum/GC throughput.
  4. Watch for write skew if two transactions decide based on the absence of rows.
  5. Be aware that long reporting transactions prevent vacuum from cleaning up versions until they finish.

Check Yourself

  1. Why does MVCC not require readers to acquire locks?
  2. Why does write skew occur under snapshot isolation but not under serializable (SSI)?
  3. Why does skipping vacuum / garbage collection degrade performance even though no data is lost?

Mini Drill or Application

Given two concurrent transactions in a single-row bank account:

  • T1: balance = balance - 100 from current value 500
  • T2: balance = balance - 80 from current value 500
  1. Under strict 2PL, what sequence of events occurs? What is the final balance?
  2. Under snapshot isolation, what sequence of events occurs, and which transaction (if any) aborts?
  3. Under READ COMMITTED, what can happen and why is this dangerous?
  4. Design an application-level retry loop for the SI case.

Read This Only If Stuck