Snapshot Isolation and MVCC

What This Concept Is

Snapshot Isolation (SI) is an isolation level where each transaction reads from a consistent snapshot of the database as of the moment it started. Writes do not conflict on reads. The usual write rule is first-committer-wins: if two transactions modified the same row, the first to commit succeeds and the second aborts.

The implementation technique is Multi-Version Concurrency Control (MVCC):

Every row version is tagged with the transaction ID that created it and, when overwritten, a deletion marker with the transaction ID that deleted it.
When a transaction starts, it records a snapshot: the set of committed transactions it can see.
Reads filter versions by the snapshot: "show me the latest version whose creator is in my snapshot and whose deleter is not."
Writes append new versions; they do not overwrite. A background vacuum (PostgreSQL) or rollback segment (Oracle) reclaims obsolete versions.

SI is attractive because readers do not block writers and writers do not block readers. Lock contention drops dramatically compared to 2PL on read-heavy workloads.

Why It Matters Here

PostgreSQL's REPEATABLE READ, Oracle's default "SERIALIZABLE" (which is actually SI), MySQL InnoDB's non-locking reads, and SQL Server's SNAPSHOT mode all use MVCC/SI. It is probably the isolation level your application is running under right now, and it looks serializable on most workloads. The places it fails (write skew, phantoms under predicate invariants) are concept 5's content, but now you know why: SI is about what the reader saw at snapshot time, and write-write conflicts are detected only on the same row.

Concrete Example

Row x with two versions. T1 starts at snapshot S1; T2 starts at S2 > S1 and writes x.

Initial: x has one version: {val=10, created_by=T0, deleted_by=null}

time  T1 (start snap S1)                T2 (start snap S2)
 1    read x -> 10 (uses version created_by=T0)
 2                                      BEGIN; read x -> 10
 3                                      write x: new version {val=20, created_by=T2, deleted_by=null}
                                         old version marked deleted_by=T2 (tentative)
 4                                      COMMIT
 5    read x -> still 10 (T2 not in T1's snapshot)
 6    COMMIT

T1 sees x = 10 throughout. T2's new value becomes visible only to transactions that start after T2's commit. No read-write conflict. Two concurrent writers to x, however, would produce a write-write conflict:

time  T1                                T2
  write x: v1
                                    write x: v2 (blocks or adds tentative version)
  COMMIT (v1 wins)
                                    COMMIT (first-committer-wins -> T2 aborted or T2's write is ignored)

The engine aborts one transaction (in Postgres, T2 gets could not serialize access due to concurrent update). Lost update is prevented this way.

Common Confusion / Misconception

"MVCC is free." Not quite. Storage grows with versions, vacuum must reclaim them, and a long-running transaction pins old versions (the "long-running transaction" problem). Long read transactions in Postgres are a known source of bloat.

"SI prevents all lost updates." SI with first-committer-wins prevents the classic lost-update pattern when both transactions write the same row. It does not prevent write skew (two transactions write different rows based on the same predicate read).

"SI is serializable." No. This is the headline. SI permits write skew and some phantom cases that a true serializable schedule would forbid. Oracle's naming ("SERIALIZABLE") is misleading.

"Reads in MVCC are always fast because no locks." True in the common case, but predicate reads may still traverse many versions (slow on bloated tables) and repeatable reads across very long transactions require pinning snapshots.

How To Use It

SI is almost always the right default for general-purpose OLTP. When using it, remember:

Use atomic single-statement updates (SET n = n + 1) or compare-and-set; do not read-modify-write at the application level on hot rows.
For invariants that span multiple rows (the write-skew shape), use SELECT ... FOR UPDATE or escalate to SSI/SERIALIZABLE on that transaction.
Keep transactions short. Long transactions bloat MVCC storage and hold snapshots open.
Do not conflate "I use REPEATABLE READ in Postgres" with "I am serializable." You are not.

Check Yourself

What is the key property of SI that makes readers not block writers?
How does first-committer-wins prevent the lost-update anomaly for writes to the same row?
Why does SI not prevent write skew?
What is the cost of a long-running read transaction in a PostgreSQL-style MVCC engine?
Two transactions read value 5, each increments to 6 and writes. Under SI with first-committer-wins, what happens to the second?

Mini Drill or Application

For each, predict SI's behavior on PostgreSQL:

Two transactions both run UPDATE c SET n = n + 1 WHERE id = 1.
Two transactions both read SELECT n FROM c WHERE id = 1, compute n + 1, and UPDATE c SET n = <computed> WHERE id = 1.
Two transactions both check SELECT COUNT(*) WHERE on_call and update different rows to set on_call = false.
A long-running pg_dump holds a snapshot open while OLTP traffic updates a hot table.
A transaction reads three rows into memory, waits 30 seconds, then updates all three.

What This Concept Is​

Why It Matters Here​

Concrete Example​

Common Confusion / Misconception​

How To Use It​

Check Yourself​

Mini Drill or Application​

Read This Only If Stuck​