Module 4: Transactions & Consistency

Primary text: Designing Data-Intensive Applications (Kleppmann), Chapters 7 and 9 Selective support: Database System Concepts (Silberschatz) for classical concurrency control, Database Internals (Petrov) Part II for implementation-level detail, Distributed Systems Concepts and Design (Coulouris) for the canonical consistency treatment

This module is where single-node transaction guarantees meet distributed reality. You already know replication and partitioning from Module 3. Here you learn what transactions actually guarantee, how isolation is implemented, and which consistency model a distributed system can honestly offer.

Scope of This Module

This module is not "everything about databases." It is where correctness reasoning under concurrency becomes something you can defend.

What it covers in depth:

ACID as four distinct guarantees and which ones require which machinery
atomicity and durability via write-ahead logging (WAL) and ARIES-style recovery
BASE as a deliberately weaker vocabulary and when it fits
concurrency anomalies: dirty read, dirty write, lost update, read skew, write skew, phantom
the ANSI SQL isolation levels, what each actually prevents, and where the spec is underspecified
two-phase locking (2PL) and strict 2PL
Snapshot Isolation, MVCC, and why SI is not serializable
Serializable Snapshot Isolation (SSI) and how PostgreSQL implements it
two-phase commit (2PC), coordinator failure modes, and heuristic decisions
three-phase commit and Paxos Commit as repairs of 2PC's blocking problem
sagas and compensating actions for long-running distributed workflows
linearizability and the single-copy illusion
causal consistency, eventual consistency, and session guarantees
the CAP theorem, PACELC, and how to read them without oversimplifying

What it deliberately does not try to finish here:

consensus protocols in depth (Module 5)
full distributed-systems fault models and failure detectors (Module 5)
stream processing and event sourcing (later semester)
blockchain-specific consistency (out of scope)

Before You Start

Answer these closed-book before starting the main path:

Given a crash mid-transaction, what guarantees the database recovers to a consistent state?
Two clients run UPDATE counter SET n = n + 1. Under what isolation levels can the final value be less than it should be?
What is the difference between repeatable read and snapshot isolation?
Why can a client read its own write but later see an older value somewhere else in a replicated system?
What does CAP actually say, and what does it not say?

Diagnostic Interpretation

4-5 solid answers

You are ready for the full path and can spend less time on Cluster 1.

2-3 solid answers

Continue, but expect extra time in Cluster 2 (anomalies) and Cluster 5 (consistency models).

0-1 solid answers

Revisit Module 3 (replication) briefly. The anomalies and consistency models in this module only make sense on top of a concrete mental model of replicated state.

What This Module Is For

Transactions and consistency are where engineering bugs become correctness bugs. Throughout the program you will repeatedly be asked:

does this code correctly update the balance when two users transfer at once?
which isolation level do I set, and what does that choice cost me?
is the write I just committed visible to the next read from the same session? from a different replica?
can this workflow be a single transaction, or must it be a saga with compensations?
when my vendor says "strongly consistent," what do they actually mean?

This module builds the reasoning needed for:

consensus and replication protocols (Module 5)
service and storage architecture in later semesters
every production system where money, identity, or inventory is involved

You are learning to stop waving your hands about "the database will handle it."

Concept Map

How To Use This Module

Work in order. Later clusters only make sense if the earlier vocabulary is stable.

Cluster 1: ACID and the Single-Node Transaction

Order	Concept	Type	Focus
1	ACID Properties: What Each Actually Guarantees	PRIMARY	Pulling A, C, I, D apart and naming which property needs which machinery
2	Atomicity and Durability via WAL and Recovery	PRIMARY	Write-ahead log, redo/undo, checkpointing, ARIES-style recovery
3	BASE: The Alternative Vocabulary and Where It Fits	SUPPORTING	Basically Available, Soft state, Eventual consistency as a contrast

Cluster mastery check: Can you name which ACID letter each of a given database feature (journaling, constraint checking, isolation level, fsync) actually supports?

Cluster 2: Concurrency Anomalies

Order	Concept	Type	Focus
4	Dirty Reads, Dirty Writes, Lost Updates	PRIMARY	The three "obvious" anomalies with interleaved timelines
5	Read Skew, Write Skew, Phantom Reads	PRIMARY	The anomalies most people get wrong about SI and RR
6	Isolation Levels: RU, RC, RR, Serializable	PRIMARY	ANSI SQL levels, what each prevents, where the spec is vague

Cluster mastery check: Given a schedule of two transactions, can you name the anomaly it exhibits and the weakest isolation level that prevents it?

Cluster 3: Implementing Isolation

Order	Concept	Type	Focus
7	Two-Phase Locking (2PL) and Serialized Schedules	PRIMARY	Growing and shrinking phases, strict 2PL, deadlock, predicate locks
8	Snapshot Isolation and MVCC	PRIMARY	Versioned reads, first-committer-wins, why SI is not serializable
9	Serializable Snapshot Isolation (SSI)	SUPPORTING	Detecting dangerous rw-antidependencies on top of SI

Cluster mastery check: For a given workload, can you pick between 2PL, SI, and SSI and defend the choice by anomaly tolerance and contention profile?

Cluster 4: Distributed Transactions

Order	Concept	Type	Focus
10	Two-Phase Commit (2PC): Coordinator, Participants, Failure Modes	PRIMARY	Prepare/commit message flow, in-doubt participants, heuristic decisions
11	Three-Phase Commit and Paxos Commit	SUPPORTING	Non-blocking commit under restricted failure models
12	Sagas: Long-Running Transactions with Compensations	PRIMARY	Orchestration vs choreography, compensating actions, semantic rollback

Cluster mastery check: For a cross-service workflow, can you decide between 2PC and a saga, and can you list the compensating action for every forward step?

Cluster 5: Consistency Models

Order	Concept	Type	Focus
13	Linearizability and the Single-Copy Illusion	PRIMARY	Real-time order, the register model, cost and implementation
14	Causal Consistency, Eventual Consistency, Session Guarantees	PRIMARY	Happens-before ordering, read-your-writes, monotonic reads
15	CAP and PACELC Frameworks and When to Use Them	PRIMARY	What CAP actually says, and PACELC's extension to latency

Cluster mastery check: Given a vendor's consistency claim ("strong," "causal," "eventual with bounded staleness"), can you translate it into a client-visible behavior contract?

Then work these practice pages:

Order	Practice path	Focus
1	Isolation and Anomalies Lab	Reproduce lost-update and write-skew anomalies on PostgreSQL under different isolation settings
2	Saga Design Workshop	Design a saga for a real workflow with explicit compensations
3	Consistency Model Translation Drill	Translate vendor claims into client-visible behavior; analyze a Jepsen-style history
4	Transactions Code Katas	Timed drills on 2PC, 2PL, SI, and linearizability tests

Use Module Quiz after the concept and practice path. Use Reference and Selective Reading and Learning Resources only for targeted reinforcement.

Learning Objectives

By the end of this module you should be able to:

State what each of A, C, I, and D actually guarantees, and name the concrete mechanism (WAL, constraints, isolation level, fsync) that supports each.
Walk through an ARIES-style recovery on a small example log and explain why redo precedes undo.
Given a schedule of two transactions, identify the anomaly (dirty read, dirty write, lost update, read skew, write skew, phantom) and state the weakest isolation level that prevents it.
Describe how 2PL, Snapshot Isolation, and SSI each implement a chosen isolation level and where each breaks down.
Explain why Snapshot Isolation prevents phantoms but allows write skew, and reproduce a write-skew violation on PostgreSQL.
Draw the 2PC message flow, name every failure mode (coordinator dies, participant dies before prepare, after prepare), and describe the recovery.
Design a saga for a concrete workflow (e.g., travel booking) with forward actions and semantically correct compensations.
Define linearizability in terms of real-time order and apply it to a short history to decide whether the history is linearizable.
Distinguish causal consistency, eventual consistency, and the four standard session guarantees, and pick which guarantees an application actually needs.
State CAP and PACELC precisely and apply them to categorize a given storage system's tradeoffs without overclaiming.

Outputs

a concurrency anomaly catalog: for each of the six core anomalies, a two-row interleaved timeline, the isolation level that permits it, and the one that prevents it
a PostgreSQL reproduction log: you reproduced lost update and write skew at READ COMMITTED and REPEATABLE READ, and then eliminated each with the right setting
a 2PC state-machine diagram covering coordinator and participant, labeled with every crash-recovery transition
a saga design for at least one real workflow, with a table mapping each step to its compensating action and the idempotency requirement
a consistency model crib sheet covering linearizability, causal, session guarantees, and eventual, with one example application per row
a Jepsen-style analysis of a short history: at least one declared violation with a written argument
a short memo "when CAP is the wrong frame" that uses PACELC on the same example
a mistake log (at least 8 entries) on misread isolation levels and conflated consistency guarantees

Completion Standard

You have completed Module 4 when all of these are true:

you can name the anomaly given a two-transaction schedule and propose the weakest isolation level that prevents it
you have reproduced lost update and write skew on a real database and then eliminated each
you can defend a choice between 2PC and a saga for a specific workflow
you can write a plausible 2PC recovery path, including the "in-doubt" participant case
you can argue whether a given history is linearizable by appeal to real-time order
you have translated at least two vendor consistency claims into client-visible contracts
you no longer conflate "strong consistency" with "serializability" and no longer quote CAP as "pick two"

If you are still saying "it's ACID, so it's fine" without knowing which level, the module is not complete.

Reading Policy

Concept pages are the main path.
Local book chunks are selective reinforcement, not a second syllabus.
Read only if stuck means try the concept page, self-check, and drill first.
External Jepsen and Kleppmann blog posts are validated and targeted; read them when the concept page points to them.
Because this module underpins Module 5 (distributed systems) and every architecture module after, hand-written anomaly timelines and written 2PC/saga designs are required, not optional.

Suggested Weekly Flow

Day	Work
1	Concepts 1-3; write ACID crib sheet by hand
2	Concepts 4-5; draw at least 4 anomaly timelines
3	Concept 6; reproduce dirty read / lost update on PostgreSQL
4	Concepts 7-8; walk through 2PL and SI on the same workload
5	Concept 9 and Practice 1 (anomalies lab)
6	Concepts 10-11; draw 2PC state machine, list failure modes
7	Concept 12 and Practice 2 (saga design)
8	Concept 13; work one linearizability history by hand
9	Concepts 14-15 and Practice 3
10	Practice 4 (katas), quiz, mistake-log cleanup

Reference

If you need exact links into the local chunked books, use Reference and Selective Reading.

Rich Learning Pages

Scope of This Module​

Before You Start​

Diagnostic Interpretation​

What This Module Is For​

Concept Map​

How To Use This Module​

Cluster 1: ACID and the Single-Node Transaction​

Cluster 2: Concurrency Anomalies​

Cluster 3: Implementing Isolation​

Cluster 4: Distributed Transactions​

Cluster 5: Consistency Models​

Learning Objectives​

Outputs​

Completion Standard​

Reading Policy​

Suggested Weekly Flow​

Reference​

Rich Learning Pages​