Skip to main content

Module 4: File Systems & I/O: Case Studies

These case studies make persistence honest: file descriptors, buffering, fsync, journaling, page cache, readiness, and event loops.


Case Study 1: write Returned But Data Was Lost

Scenario: An app writes a critical config file and crashes after write returns. After reboot, the file is empty or old.

Source anchor: fsync(2) documents flushing modified in-core file data to stable storage.

Module concepts: write buffer, page cache, durability, fsync, rename.

Wrong Approach

"write returning means data is on disk."

Better Approach

Use an atomic file update pattern:

write temp file
fsync temp file
rename temp -> target
fsync containing directory

Tradeoff Table

ChoiceGainCost
write in placesimple codecorruption risk on crash
temp file + fsync + renamestrong durabilitymore I/O
defer sync entirelylow latencydata loss window

Failure Mode

The application updates page-cache state but crashes before data and metadata reach durable storage, leaving an empty, partial, or stale file after reboot.

Project / Capstone Connection

Use this for config writers, grade exports, or any capstone workflow that rewrites important files and must survive power loss cleanly.

Required Artifact

Draw the crash points and state what exists after reboot.


Case Study 2: select Falls Over With Many Sockets

Scenario: A server watches 50,000 connections with select. CPU rises even when little traffic arrives.

Source anchor: epoll(7) explains scalable readiness notification through interest and ready lists.

Module concepts: select, poll, epoll, readiness, event loop.

Wrong Approach

"All readiness APIs scale the same."

Better Approach

Use an event-loop model:

interest list:
file descriptors to watch

ready list:
descriptors ready for I/O

loop:
wait, drain, update interest

Tradeoff Table

ChoiceGainCost
selectwidely knownpoor scaling at high fd counts
pollsimpler fd-set handlingstill scans all fds
epollefficient for many idle socketsLinux-specific complexity

Failure Mode

The server repeatedly scans large descriptor sets even when little is ready, so CPU climbs with connection count rather than useful work.

Project / Capstone Connection

This fits chat servers, websocket backends, or proxy capstones that need to hold many mostly-idle connections efficiently.

Required Artifact

Compare select, poll, and epoll for 50,000 mostly-idle sockets.


Case Study 3: Page Cache Makes Benchmark Lie

Scenario: A file-read benchmark is extremely fast on the second run. The learner concludes the disk is fast.

Source anchor: Linux filesystem behavior and page cache explain cached reads. See Linux page cache documentation where available, plus module readings.

Module concepts: page cache, cold cache, warm cache, benchmarking.

Wrong Approach

Benchmark only warm-cache reads.

Better Approach

State cache condition:

cold run:
includes storage I/O

warm run:
measures memory/page-cache path

production:
estimate cache hit ratio

Tradeoff Table

ChoiceGainCost
warm-cache-only benchmarkeasy repeatabilitymisleading storage claims
cold and warm runsfuller pictureharder setup
production trace correlationrealistic interpretationmore measurement work

Failure Mode

The second run measures memory-resident page-cache behavior, but the learner reports it as disk throughput and reaches the wrong system conclusion.

Project / Capstone Connection

Use this when presenting benchmark results for backup tools, media pipelines, or data-ingest capstones that depend on storage behavior.

Required Artifact

Write a benchmark report with cold/warm runs, cache condition, and interpretation.


Case Study 4: File Descriptor Leak

Scenario: A server opens files/sockets and forgets to close some error paths. Eventually EMFILE appears.

Source anchor: Linux open(2) and close(2) man pages describe file descriptors and lifecycle. See open(2) and close(2).

Module concepts: file descriptor, open-file table, resource leak, limits.

Wrong Approach

"Memory is the only leak that matters."

Better Approach

Track fd ownership:

open point:
who owns fd?

transfer:
does ownership move?

close:
all success/error paths

Tradeoff Table

ChoiceGainCost
implicit fd ownershipquick codingleak-prone error paths
explicit owner per fdclearer cleanupmore discipline
RAII/helper wrappersafer lifecycleabstraction overhead

Failure Mode

Open descriptors survive exceptional paths and retries until the process hits the per-process fd limit and new opens fail with EMFILE.

Project / Capstone Connection

This belongs in servers, crawlers, or pipeline capstones that open many files and sockets under mixed success and failure paths.

Required Artifact

Write an fd ownership checklist and leak reproduction.


Case Study 5: Synchronous Logging Blocks Request Path

Scenario: Every request writes and flushes a log line synchronously. Tail latency follows disk latency.

Source anchor: fsync(2) and I/O readiness docs show why persistence and request latency are coupled when flushed inline.

Module concepts: synchronous I/O, buffering, durability, latency.

Wrong Approach

Flush every log line on the request thread.

Better Approach

Separate durability class:

audit/security event:
durable path required

debug/request log:
buffered async path acceptable

Tradeoff Table

ChoiceGainCost
synchronous flush per requeststrongest per-line durabilityhigh latency
buffered async loggingfast request pathbounded log loss
split audit vs debug channelsaligned durabilityadded routing complexity

Failure Mode

Request latency inherits storage latency because the request thread blocks on every flush instead of handing off noncritical logs.

Project / Capstone Connection

Apply this when deciding how application, audit, and debug logs should flow through capstone services with different loss-tolerance requirements.

Required Artifact

Create a logging durability matrix: event type, loss tolerance, flush policy, backpressure behavior.


Source Map

SourceUse it for
fsync(2)durability and flushing
epoll(7)scalable readiness notification
open(2) and close(2)file descriptor lifecycle

Completion Standard

  • At least three artifacts are completed.
  • At least one artifact includes crash points.
  • At least one artifact compares readiness APIs.