Skip to main content

Read Skew, Write Skew, Phantom Reads

What This Concept Is

The three "less obvious" anomalies. These are the ones practitioners misread, because the cases where Snapshot Isolation (SI) and naive Repeatable Read "feel safe" are exactly where these bugs hide.

  • Read skew (non-repeatable read): a single transaction reads the same row twice and gets different values because another transaction committed in between. Under SI, this does not happen for rows you read (you see a consistent snapshot), but under Read Committed it can.
  • Write skew: two concurrent transactions read an overlapping set of rows, make disjoint writes, and each write would be individually valid but together they violate an invariant that spans the rows. SI does not prevent this.
  • Phantom read: a transaction runs a predicate query (e.g., WHERE on_call = true) and later reruns it. A concurrent transaction has inserted or updated a row that changes the result set. The new row is a "phantom" from the first transaction's perspective.

Write skew is the single most important anomaly to understand in this module. It is the reason SI is not serializable.

Why It Matters Here

Most production databases default to SI or Read Committed, not true serializability. Teams assume SI is "good enough" because it prevents the dirty and lost-update anomalies. But write skew breaks multi-row invariants (at least one on-call doctor, uniqueness across a predicate, double-booking across independent inventory rows, etc.) in ways that look exactly like bugs in application logic. Recognizing write skew is the difference between "why does this invariant keep getting violated" and "oh, we need Serializable or explicit predicate locking."

Concrete Example

Read Skew

Account A = 500, B = 500, total 1000. Under Read Committed:

time  T1 (report total)             T2 (transfer 100 A -> B)
1 SELECT bal FROM A -> 500
2 BEGIN
3 UPDATE A SET bal = 400
4 UPDATE B SET bal = 600
5 COMMIT
6 SELECT bal FROM B -> 600

T1 reports total = 1100, a state that never existed. Fixed by SI / Repeatable Read where T1 sees a consistent snapshot.

Write Skew

On-call doctor invariant: at least one doctor must be on call. Alice and Bob are both on call. Each wants to take the evening off. Under SI:

time  T1 (Alice)                              T2 (Bob)
1 SELECT COUNT(*) WHERE on_call -> 2
2 SELECT COUNT(*) WHERE on_call -> 2
3 UPDATE doctors SET on_call=false
WHERE name='Alice'
4 UPDATE doctors SET on_call=false
WHERE name='Bob'
5 COMMIT
6 COMMIT

Both transactions read the snapshot (count=2), decided they could safely decrement, and wrote disjoint rows. No write-write conflict, no lost update, no dirty read. SI is perfectly happy. Result: zero on-call doctors. Invariant violated. This is classic write skew.

Phantom Read

Reserve a meeting room: a transaction checks "no bookings exist in [t1, t2]" and inserts a new booking. Two users book the same slot:

time  T1                                       T2
1 SELECT * WHERE room=7 AND t in [3,4] -> []
2 SELECT * WHERE room=7 AND t in [3,4] -> []
3 INSERT (room=7, t=[3,4], user=Alice)
4 INSERT (room=7, t=[3,4], user=Bob)
5 COMMIT
6 COMMIT

Neither transaction saw the other's insert because the row did not yet exist when each did its predicate read. Under SI this still happens; under MySQL InnoDB's "Repeatable Read" this is prevented by gap locks; under true Serializable it is prevented.

Common Confusion / Misconception

"Snapshot Isolation prevents all anomalies except dirty reads." Wrong. SI prevents dirty read, dirty write, read skew, lost update (in engines with first-committer-wins). SI does not prevent write skew or phantoms in general.

"Phantoms are a special case of write skew." They are related but distinct. Phantoms are about what rows a predicate sees; write skew is about invariants across rows read and written. A phantom often causes a write skew when the predicate query's result set differs between transactions.

"Repeatable Read in the SQL standard means no phantoms." The standard's text is ambiguous; Repeatable Read historically allowed phantoms in many engines. MySQL's InnoDB prevents them at RR via gap locks; Postgres's REPEATABLE READ is actually Snapshot Isolation and does not prevent write skew. Always check your engine.

How To Use It

Whenever a transaction makes a decision based on a set of rows, suspect write skew / phantoms. Questions:

  1. Does the decision read a predicate (count, existence, range) and then write based on that count/existence?
  2. Could a concurrent transaction read the same predicate, reach the same decision, and write disjoint rows?
  3. Do the two disjoint writes together violate an invariant that was true before?

If yes to all three, SI will not save you. Use explicit range locks, SELECT ... FOR UPDATE with a covering lock, SSI (PostgreSQL SERIALIZABLE), or real serializable isolation (MSSQL SERIALIZABLE, 2PL engines).

Check Yourself

  1. Why does SI prevent read skew but not write skew?
  2. Write the shape of a write skew: two transactions, a predicate read, two disjoint writes, a broken invariant. Give a fresh example not from doctors or room booking.
  3. How does MySQL InnoDB's gap lock differ from PostgreSQL SI's treatment of phantoms?
  4. Can lost update be framed as write skew? Why or why not?

Mini Drill or Application

For each, name the anomaly and state whether SI would prevent it:

  1. A report sums balances during an ongoing transfer and comes out wrong.
  2. Two admins concurrently toggle off the last two active feature flags that were required to have at least one active.
  3. A unique-username constraint is enforced at the app layer by checking, then inserting; two signups with the same name succeed.
  4. A balance-check transaction sees the balance drop between two reads in the same statement.
  5. A scheduler reads "no meetings between 3-4pm" in two concurrent transactions, each inserts a meeting, both land.

Read This Only If Stuck