Skip to main content

I/O Performance Lab

Retrieval Prompts

  1. State the sequential vs random read IOPS ratio you expect on: HDD, SATA SSD, NVMe Gen4.
  2. Describe what the page cache is and how it relates to "free" memory.
  3. State exactly what fsync guarantees and what close does not.
  4. Describe read-ahead and name one workload where it hurts.
  5. Describe write-back and name one failure it enables.

Compare and Distinguish

Separate these pairs:

  • throughput vs IOPS vs latency
  • sequential vs random, small-block vs large-block
  • page cache vs drive cache
  • fsync vs fdatasync vs sync_file_range
  • O_DIRECT vs buffered I/O

Common Mistake Check

Identify the error:

  1. "My workload is fast because I fit in the page cache. So it will be fast in production."
  2. "O_DIRECT is always faster for databases."
  3. "Since NVMe has no seek penalty, random and sequential are equivalent."
  4. "free shows little free memory, so I need to add RAM."
  5. "I benchmarked with a warm cache, and the numbers match disk spec-sheet IOPS."
  6. "I use sync_file_range, so my data is durable."
  7. "Write-back means my writes are async and free."

Measurement Drills

Run each drill and record results. Explain each observation in one paragraph.

Drill 1: Cache effects

dd if=/dev/zero of=/tmp/big bs=1M count=1024
sync
echo 3 | sudo tee /proc/sys/vm/drop_caches
time cat /tmp/big > /dev/null # cold
time cat /tmp/big > /dev/null # warm

Report cold and warm times. Compute effective MB/s for each.

Drill 2: fsync cost

Write a small program:

  • Open /tmp/log for appending, O_CREAT|O_WRONLY|O_APPEND.
  • Loop 10,000 times: write(fd, buf, 4096). No fsync. Record time.
  • Repeat with fsync after each write. Record time.
  • Repeat with fsync after every 100 writes. Record time.
  • Compare with O_SYNC (equivalent to fsync per op in terms of durability).

Report throughput (ops/sec) and amortized cost per op. Explain the gap.

Drill 3: Sequential vs random

Using fio, run the following on an empty file (~1 GiB):

fio --name=seq-r --rw=read --bs=1M --size=1G --direct=0
fio --name=seq-w --rw=write --bs=1M --size=1G --direct=0
fio --name=rand-r --rw=randread --bs=4k --size=1G --direct=1 --iodepth=32
fio --name=rand-w --rw=randwrite --bs=4k --size=1G --direct=1 --iodepth=32

Tabulate IOPS, bandwidth, and average latency for each.

Drill 4: Read-ahead heuristic

  • dd a 4 GiB file of zeros.
  • Read it sequentially with O_DIRECT (blocking read-ahead).
  • Read it sequentially without O_DIRECT (read-ahead enabled).
  • Use posix_fadvise(POSIX_FADV_RANDOM) and read sequentially; compare.

Explain what read-ahead gives you and when it misfires.

Mini Application: Budget

You have an HDD with 100 random IOPS and 150 MiB/s sequential. You need to:

  • ingest 10,000 small records/sec, each ~512 B.
  • serve random read requests at 50 req/sec.

Design an on-disk layout that can handle the workload. Estimate disk utilization. Which data structures (write-back batching, group-commit WAL, LSM tree, B-tree)? Justify with numeric I/O budget.

Repeat the exercise for NVMe at 500 kIOPS random + 3 GiB/s sequential.

Evidence Check

This page is complete only if you can:

  • run the measurements above on your own system
  • explain surprises and non-surprises with reference to page cache, drive cache, and device characteristics
  • defend a storage design choice with an I/O budget