Snapshot Isolation and MVCC
What This Concept Is
Snapshot Isolation (SI) is an isolation level where each transaction reads from a consistent snapshot of the database as of the moment it started. Writes do not conflict on reads. The usual write rule is first-committer-wins: if two transactions modified the same row, the first to commit succeeds and the second aborts.
The implementation technique is Multi-Version Concurrency Control (MVCC):
- Every row version is tagged with the transaction ID that created it and, when overwritten, a deletion marker with the transaction ID that deleted it.
- When a transaction starts, it records a snapshot: the set of committed transactions it can see.
- Reads filter versions by the snapshot: "show me the latest version whose creator is in my snapshot and whose deleter is not."
- Writes append new versions; they do not overwrite. A background vacuum (PostgreSQL) or rollback segment (Oracle) reclaims obsolete versions.
SI is attractive because readers do not block writers and writers do not block readers. Lock contention drops dramatically compared to 2PL on read-heavy workloads.
Why It Matters Here
PostgreSQL's REPEATABLE READ, Oracle's default "SERIALIZABLE" (which is actually SI), MySQL InnoDB's non-locking reads, and SQL Server's SNAPSHOT mode all use MVCC/SI. It is probably the isolation level your application is running under right now, and it looks serializable on most workloads. The places it fails (write skew, phantoms under predicate invariants) are concept 5's content, but now you know why: SI is about what the reader saw at snapshot time, and write-write conflicts are detected only on the same row.
Concrete Example
Row x with two versions. T1 starts at snapshot S1; T2 starts at S2 > S1 and writes x.
Initial: x has one version: {val=10, created_by=T0, deleted_by=null}
time T1 (start snap S1) T2 (start snap S2)
1 read x -> 10 (uses version created_by=T0)
2 BEGIN; read x -> 10
3 write x: new version {val=20, created_by=T2, deleted_by=null}
old version marked deleted_by=T2 (tentative)
4 COMMIT
5 read x -> still 10 (T2 not in T1's snapshot)
6 COMMIT
T1 sees x = 10 throughout. T2's new value becomes visible only to transactions that start after T2's commit. No read-write conflict. Two concurrent writers to x, however, would produce a write-write conflict:
time T1 T2
1 write x: v1
2 write x: v2 (blocks or adds tentative version)
3 COMMIT (v1 wins)
4 COMMIT (first-committer-wins -> T2 aborted or T2's write is ignored)
The engine aborts one transaction (in Postgres, T2 gets could not serialize access due to concurrent update). Lost update is prevented this way.
Common Confusion / Misconception
"MVCC is free." Not quite. Storage grows with versions, vacuum must reclaim them, and a long-running transaction pins old versions (the "long-running transaction" problem). Long read transactions in Postgres are a known source of bloat.
"SI prevents all lost updates." SI with first-committer-wins prevents the classic lost-update pattern when both transactions write the same row. It does not prevent write skew (two transactions write different rows based on the same predicate read).
"SI is serializable." No. This is the headline. SI permits write skew and some phantom cases that a true serializable schedule would forbid. Oracle's naming ("SERIALIZABLE") is misleading.
"Reads in MVCC are always fast because no locks." True in the common case, but predicate reads may still traverse many versions (slow on bloated tables) and repeatable reads across very long transactions require pinning snapshots.
How To Use It
SI is almost always the right default for general-purpose OLTP. When using it, remember:
- Use atomic single-statement updates (
SET n = n + 1) or compare-and-set; do not read-modify-write at the application level on hot rows. - For invariants that span multiple rows (the write-skew shape), use
SELECT ... FOR UPDATEor escalate to SSI/SERIALIZABLE on that transaction. - Keep transactions short. Long transactions bloat MVCC storage and hold snapshots open.
- Do not conflate "I use REPEATABLE READ in Postgres" with "I am serializable." You are not.
Check Yourself
- What is the key property of SI that makes readers not block writers?
- How does first-committer-wins prevent the lost-update anomaly for writes to the same row?
- Why does SI not prevent write skew?
- What is the cost of a long-running read transaction in a PostgreSQL-style MVCC engine?
- Two transactions read value 5, each increments to 6 and writes. Under SI with first-committer-wins, what happens to the second?
Mini Drill or Application
For each, predict SI's behavior on PostgreSQL:
- Two transactions both run
UPDATE c SET n = n + 1 WHERE id = 1. - Two transactions both read
SELECT n FROM c WHERE id = 1, computen + 1, andUPDATE c SET n = <computed> WHERE id = 1. - Two transactions both check
SELECT COUNT(*) WHERE on_calland update different rows to seton_call = false. - A long-running
pg_dumpholds a snapshot open while OLTP traffic updates a hot table. - A transaction reads three rows into memory, waits 30 seconds, then updates all three.
Read This Only If Stuck
- DDIA: Snapshot Isolation and Repeatable Read (part 1)
- DDIA: Snapshot Isolation and Repeatable Read (part 2)
- Database Internals: Optimistic concurrency control
- Database System Concepts: Multiversion schemes
- Database System Concepts: Snapshot isolation (part 1)
- Database System Concepts: Snapshot isolation (part 2)