Atomicity and Durability via WAL and Recovery

What This Concept Is

Atomicity and durability are implemented by the same underlying mechanism: the write-ahead log (WAL). The rule is simple and unforgiving: a modification to a data page is never considered durable until its log record has been forced to stable storage. This single rule makes A and D tractable in the presence of crashes.

Key pieces:

Log records: every change (insert, update, delete) writes an entry describing old value (for undo) and new value (for redo), plus a log sequence number (LSN) that totally orders log records.
WAL rule: flush the log record before the corresponding data page reaches disk.
Commit marker: a COMMIT record in the log. When this record is on stable storage, the transaction is durable; until then, it can be rolled back.
Checkpoints: periodic records that summarize "everything up to LSN X is on disk in the data files," bounding recovery work.
Recovery: on startup after a crash, scan the log from the last checkpoint. Redo committed work. Undo in-progress work. The canonical algorithm is ARIES: analysis, redo, undo.

Why It Matters Here

If you do not understand WAL, statements like "group commit," "synchronous replication," and "fsync matters" sound interchangeable when they are not. You also cannot debug why a COMMIT sometimes takes 20ms (fsync latency) or why a database can lose work on power loss despite returning success (fsync lied because of volatile caches). Every later durability claim, including synchronous replication, is WAL-shaped.

Concrete Example

Schedule of two transactions T1 (credits account A) and T2 (credits account B), both active when the system crashes after T1 commits but before T2 commits.

LSN 10  T1 BEGIN
LSN 11  T1 UPDATE A: old=500 new=600
LSN 12  T2 BEGIN
LSN 13  T2 UPDATE B: old=200 new=300
LSN 14  T1 COMMIT     <- log forced to disk
        -- CRASH before data pages written --

Recovery:

Analysis: reads from last checkpoint. Determines T1 committed (has COMMIT record), T2 did not.
Redo: replays LSN 11 and LSN 13 against the data pages (redo is applied for all transactions, even losers, to reach a known state).
Undo: rolls back T2's effect at LSN 13 by writing old=200 back. Each undo action itself logs a compensation log record (CLR) so re-crashes during recovery still recover.

Final durable state: A = 600 (T1 committed), B = 200 (T2 rolled back). Atomicity preserved for T2, durability preserved for T1.

Common Confusion / Misconception

"Logging is about audit." No. Logging here is the primary persistence mechanism. The log is the source of truth at commit time; data files catch up lazily.

"fsync makes writes durable." fsync asks the OS to force buffers to the disk; the disk can still have a volatile cache. Databases like PostgreSQL expose synchronous_commit and require battery-backed or power-loss-protected disks for true durability.

"Redo is for committed, undo is for uncommitted." ARIES redoes all logged changes (committed or not) to reach the state at crash time, then undoes only uncommitted transactions. The nuance matters because it lets the recovery algorithm use the same replay machinery for both.

How To Use It

When reasoning about a DB's durability story, ask these in order:

Does it write a WAL record before the data page?
Is fsync issued on COMMIT, and is the disk honest about it?
On crash, can the recovery algorithm rebuild all committed state from WAL alone?
Is the recovery bounded by checkpoints or must it scan the entire log?

If any answer is "no" or "depends," that is where your durability story has a seam.

Check Yourself

Why must a log record be flushed before the corresponding data page, and not the other way around?
What is the role of checkpoints in bounding recovery time?
During recovery, why is redo applied even to transactions that will be rolled back in the undo phase?
What is a compensation log record (CLR) and why does it matter for crashes during recovery?
If your disk has a volatile write cache that lies about fsync, which ACID property is at risk and how?

Mini Drill or Application

Given this log (LSN: action):

T1 BEGIN
T1 UPDATE x: 5 -> 7
CHECKPOINT (dirty: T1; data file x=5)
T2 BEGIN
T2 UPDATE y: 2 -> 4
T1 COMMIT
T3 BEGIN
T3 UPDATE z: 8 -> 9
        -- CRASH --

Which log records are redone during recovery?
Which transactions are undone, and what do their undo records write back?
What is the final state of x, y, z after recovery?
Why is the checkpoint at LSN 102 useful even though T1 had not yet committed?

What This Concept Is​

Why It Matters Here​

Concrete Example​

Common Confusion / Misconception​

How To Use It​

Check Yourself​

Mini Drill or Application​

Read This Only If Stuck​