I/O Performance Lab
Retrieval Prompts
- State the sequential vs random read IOPS ratio you expect on: HDD, SATA SSD, NVMe Gen4.
- Describe what the page cache is and how it relates to "free" memory.
- State exactly what
fsyncguarantees and whatclosedoes not. - Describe read-ahead and name one workload where it hurts.
- Describe write-back and name one failure it enables.
Compare and Distinguish
Separate these pairs:
- throughput vs IOPS vs latency
- sequential vs random, small-block vs large-block
- page cache vs drive cache
fsyncvsfdatasyncvssync_file_rangeO_DIRECTvs buffered I/O
Common Mistake Check
Identify the error:
- "My workload is fast because I fit in the page cache. So it will be fast in production."
- "
O_DIRECTis always faster for databases." - "Since NVMe has no seek penalty, random and sequential are equivalent."
- "
freeshows little free memory, so I need to add RAM." - "I benchmarked with a warm cache, and the numbers match disk spec-sheet IOPS."
- "I use
sync_file_range, so my data is durable." - "Write-back means my writes are async and free."
Measurement Drills
Run each drill and record results. Explain each observation in one paragraph.
Drill 1: Cache effects
dd if=/dev/zero of=/tmp/big bs=1M count=1024
sync
echo 3 | sudo tee /proc/sys/vm/drop_caches
time cat /tmp/big > /dev/null # cold
time cat /tmp/big > /dev/null # warm
Report cold and warm times. Compute effective MB/s for each.
Drill 2: fsync cost
Write a small program:
- Open
/tmp/logfor appending,O_CREAT|O_WRONLY|O_APPEND. - Loop 10,000 times:
write(fd, buf, 4096). Nofsync. Record time. - Repeat with
fsyncafter eachwrite. Record time. - Repeat with
fsyncafter every 100writes. Record time. - Compare with
O_SYNC(equivalent tofsyncper op in terms of durability).
Report throughput (ops/sec) and amortized cost per op. Explain the gap.
Drill 3: Sequential vs random
Using fio, run the following on an empty file (~1 GiB):
fio --name=seq-r --rw=read --bs=1M --size=1G --direct=0
fio --name=seq-w --rw=write --bs=1M --size=1G --direct=0
fio --name=rand-r --rw=randread --bs=4k --size=1G --direct=1 --iodepth=32
fio --name=rand-w --rw=randwrite --bs=4k --size=1G --direct=1 --iodepth=32
Tabulate IOPS, bandwidth, and average latency for each.
Drill 4: Read-ahead heuristic
dda 4 GiB file of zeros.- Read it sequentially with
O_DIRECT(blocking read-ahead). - Read it sequentially without
O_DIRECT(read-ahead enabled). - Use
posix_fadvise(POSIX_FADV_RANDOM)and read sequentially; compare.
Explain what read-ahead gives you and when it misfires.
Mini Application: Budget
You have an HDD with 100 random IOPS and 150 MiB/s sequential. You need to:
- ingest 10,000 small records/sec, each ~512 B.
- serve random read requests at 50 req/sec.
Design an on-disk layout that can handle the workload. Estimate disk utilization. Which data structures (write-back batching, group-commit WAL, LSM tree, B-tree)? Justify with numeric I/O budget.
Repeat the exercise for NVMe at 500 kIOPS random + 3 GiB/s sequential.
Evidence Check
This page is complete only if you can:
- run the measurements above on your own system
- explain surprises and non-surprises with reference to page cache, drive cache, and device characteristics
- defend a storage design choice with an I/O budget