Crash Consistency Clinic
Retrieval Prompts
- State from memory the three on-disk writes required to append a block under ext-style update-in-place.
- State the invariant a journaling FS relies on: "commit block last."
- Explain why
orderedmode is preferred overwritebackmode in ext4. - State the COW invariant in one sentence: "old or new, never partial."
- Why is a directory
fsyncrequired after creating a new file?
Compare and Distinguish
Separate these pairs:
- journaling vs copy-on-write
data=journalvsdata=orderedvsdata=writebackfsckpass vs journal replayfsync(file)vsfsync(dir)vssync()- atomic
renamevs direct overwrite
Common Mistake Check
Identify the error in each statement:
- "Journaling doubles all writes."
- "A COW FS cannot be corrupted."
- "
fsyncguarantees the drive has the data." - "
closeimpliesfsync." - "If the kernel issues writes in order, the disk commits them in order."
- "
sync_file_rangeis a fasterfsync." - "Metadata journaling protects user data."
Crash Scenarios
For each scenario, draw the initial on-disk state, list the writes in order, and for every crash point (between each write) describe post-crash state under:
- naive update-in-place (no journal)
- ext4
data=orderedjournaling - COW (Btrfs-style)
Scenarios:
- Append: extend a 4 KiB file to 8 KiB.
- Overwrite: replace bytes 0-99 of a 1 MiB file.
- Rename:
rename("a.tmp", "a")within a directory. - Unlink:
rm aon a file withnlink = 1, open by no one. - Unlink-open:
rm aon a file withnlink = 1, open by one process. - mkdir: create a new subdirectory in a non-full parent directory.
For each scenario you should be able to name the worst crash point and what is lost or corrupted.
Safe Write Patterns
Implement and verify each pattern. For each, identify the exact durability guarantee:
- Durable atomic file replacement
write(tmpfd, data); fsync(tmpfd); close(tmpfd); rename(tmp, target); fsync(dirfd)
- Safe append of a record to a log
write(fd, record); fsync(fd)
- Group commit
write(fd, records); fsync(fd)once per batch, N records per batch.
- Bad pattern to avoid
rename(new, target)withoutfsyncon either the file or the directory.
For each, explain: what does a crash after step k recover to?
Mini Application: Recovery Log
Build a table:
| Scenario | FS | Post-crash state | Recoverable? | Notes |
|---|---|---|---|---|
| Append crash between data write and commit | ext4 ordered | Data block written but not pointed to | Yes (journal ignores uncommitted) | Safe |
Append crash between data and metadata under data=writeback | ext4 writeback | Metadata may point at garbage | Only partially | Stale data visible |
| Overwrite crash mid-sector | any | Drive-level: sector may be atomically-or-not | Depends on hardware | Drive sector atomicity |
Fill in at least 10 rows across the scenarios from the previous section.
Evidence Check
This page is complete only if you can:
- trace any multi-block operation and identify every unsafe crash ordering
- explain the role of commit blocks, barriers, and drive cache flushes
- write safe code for durable rename and durable appends without reference