Module 4: File Systems & I/O: Case Studies
These case studies make persistence honest: file descriptors, buffering, fsync, journaling, page cache, readiness, and event loops.
Case Study 1: write Returned But Data Was Lost
Scenario: An app writes a critical config file and crashes after write returns. After reboot, the file is empty or old.
Source anchor: fsync(2) documents flushing modified in-core file data to stable storage.
Module concepts: write buffer, page cache, durability, fsync, rename.
Wrong Approach
"write returning means data is on disk."
Better Approach
Use an atomic file update pattern:
write temp file
fsync temp file
rename temp -> target
fsync containing directory
Tradeoff Table
| Choice | Gain | Cost |
|---|---|---|
| write in place | simple code | corruption risk on crash |
| temp file + fsync + rename | strong durability | more I/O |
| defer sync entirely | low latency | data loss window |
Failure Mode
The application updates page-cache state but crashes before data and metadata reach durable storage, leaving an empty, partial, or stale file after reboot.
Project / Capstone Connection
Use this for config writers, grade exports, or any capstone workflow that rewrites important files and must survive power loss cleanly.
Required Artifact
Draw the crash points and state what exists after reboot.
Case Study 2: select Falls Over With Many Sockets
Scenario: A server watches 50,000 connections with select. CPU rises even when little traffic arrives.
Source anchor: epoll(7) explains scalable readiness notification through interest and ready lists.
Module concepts: select, poll, epoll, readiness, event loop.
Wrong Approach
"All readiness APIs scale the same."
Better Approach
Use an event-loop model:
interest list:
file descriptors to watch
ready list:
descriptors ready for I/O
loop:
wait, drain, update interest
Tradeoff Table
| Choice | Gain | Cost |
|---|---|---|
select | widely known | poor scaling at high fd counts |
poll | simpler fd-set handling | still scans all fds |
epoll | efficient for many idle sockets | Linux-specific complexity |
Failure Mode
The server repeatedly scans large descriptor sets even when little is ready, so CPU climbs with connection count rather than useful work.
Project / Capstone Connection
This fits chat servers, websocket backends, or proxy capstones that need to hold many mostly-idle connections efficiently.
Required Artifact
Compare select, poll, and epoll for 50,000 mostly-idle sockets.
Case Study 3: Page Cache Makes Benchmark Lie
Scenario: A file-read benchmark is extremely fast on the second run. The learner concludes the disk is fast.
Source anchor: Linux filesystem behavior and page cache explain cached reads. See Linux page cache documentation where available, plus module readings.
Module concepts: page cache, cold cache, warm cache, benchmarking.
Wrong Approach
Benchmark only warm-cache reads.
Better Approach
State cache condition:
cold run:
includes storage I/O
warm run:
measures memory/page-cache path
production:
estimate cache hit ratio
Tradeoff Table
| Choice | Gain | Cost |
|---|---|---|
| warm-cache-only benchmark | easy repeatability | misleading storage claims |
| cold and warm runs | fuller picture | harder setup |
| production trace correlation | realistic interpretation | more measurement work |
Failure Mode
The second run measures memory-resident page-cache behavior, but the learner reports it as disk throughput and reaches the wrong system conclusion.
Project / Capstone Connection
Use this when presenting benchmark results for backup tools, media pipelines, or data-ingest capstones that depend on storage behavior.
Required Artifact
Write a benchmark report with cold/warm runs, cache condition, and interpretation.
Case Study 4: File Descriptor Leak
Scenario: A server opens files/sockets and forgets to close some error paths. Eventually EMFILE appears.
Source anchor: Linux open(2) and close(2) man pages describe file descriptors and lifecycle. See open(2) and close(2).
Module concepts: file descriptor, open-file table, resource leak, limits.
Wrong Approach
"Memory is the only leak that matters."
Better Approach
Track fd ownership:
open point:
who owns fd?
transfer:
does ownership move?
close:
all success/error paths
Tradeoff Table
| Choice | Gain | Cost |
|---|---|---|
| implicit fd ownership | quick coding | leak-prone error paths |
| explicit owner per fd | clearer cleanup | more discipline |
| RAII/helper wrapper | safer lifecycle | abstraction overhead |
Failure Mode
Open descriptors survive exceptional paths and retries until the process hits the per-process fd limit and new opens fail with EMFILE.
Project / Capstone Connection
This belongs in servers, crawlers, or pipeline capstones that open many files and sockets under mixed success and failure paths.
Required Artifact
Write an fd ownership checklist and leak reproduction.
Case Study 5: Synchronous Logging Blocks Request Path
Scenario: Every request writes and flushes a log line synchronously. Tail latency follows disk latency.
Source anchor: fsync(2) and I/O readiness docs show why persistence and request latency are coupled when flushed inline.
Module concepts: synchronous I/O, buffering, durability, latency.
Wrong Approach
Flush every log line on the request thread.
Better Approach
Separate durability class:
audit/security event:
durable path required
debug/request log:
buffered async path acceptable
Tradeoff Table
| Choice | Gain | Cost |
|---|---|---|
| synchronous flush per request | strongest per-line durability | high latency |
| buffered async logging | fast request path | bounded log loss |
| split audit vs debug channels | aligned durability | added routing complexity |
Failure Mode
Request latency inherits storage latency because the request thread blocks on every flush instead of handing off noncritical logs.
Project / Capstone Connection
Apply this when deciding how application, audit, and debug logs should flow through capstone services with different loss-tolerance requirements.
Required Artifact
Create a logging durability matrix: event type, loss tolerance, flush policy, backpressure behavior.
Source Map
| Source | Use it for |
|---|---|
| fsync(2) | durability and flushing |
| epoll(7) | scalable readiness notification |
| open(2) and close(2) | file descriptor lifecycle |
Completion Standard
- At least three artifacts are completed.
- At least one artifact includes crash points.
- At least one artifact compares readiness APIs.