Page Fault and TLB Measurement Lab
A measurement-first lab. Every claim in the concept pages about paging cost should become a curve you have seen with your own eyes.
Retrieval Prompts
- State from memory the difference between a minor and a major page fault.
- State from memory what a TLB miss is and what it costs relative to a TLB hit.
- Explain why sequential traversal has different cache and TLB behavior than strided traversal.
- Write the
perf statcommand to measureminor-faults,major-faults,dTLB-load-misses, anddTLB-loadsin one shot. - Explain why
getrusagegives you fault counts but not TLB miss counts.
Compare and Distinguish
Separate these pairs clearly:
- TLB miss versus page fault
- minor fault versus major fault
perf statcounters versus/proc/$pid/statfields- working-set size versus resident set size
- anonymous vs file-backed major faults
Common Mistake Check
Identify the error in each statement:
- "The program is slow because of page faults." (when
perf statshows 20 faults total) - "Linux does not use LRU so I do not need to worry about replacement policies."
- "A huge page eliminates TLB misses."
- "
perf statshows 1 billion TLB loads, so the program has 1 billion TLB misses." - "Adding RAM to the machine always reduces minor-fault count."
Mini Application
Lab A: Minor vs Major faults on fresh allocation
Write a C or Rust program that:
mmaps 1 GiB anonymous private, then exits. Recordperf stat -e minor-faults,major-faultsoutput.mmaps 1 GiB anonymous private and writes one byte per 4 KiB page, then exits. Same measurement.mmaps a 1 GiB file private read-only, reads every byte, then exits. Same measurement. Repeat after flushing the page cache (echo 3 > /proc/sys/vm/drop_cacheswith root, or reboot).
Record minor, major, runtime, and RSS for each. Explain every order-of-magnitude difference in writing.
Lab B: The stride-knee experiment for TLB miss cost
Allocate an array of N 8-byte entries, large enough that it does not fit in L2 cache and preferably exceeds L2 TLB reach at 4 KiB pages. Write a loop:
for (size_t i = 0; i < iterations; i++) {
sum += arr[(i * stride) % N];
}
Run for stride = 1, 8, 64, 512, 4096, 8192, 65536, 524288 (8-byte units). Measure time per access and:
dTLB-load-missesL1-dcache-load-missesLLC-load-misses
Plot or tabulate time per access as a function of stride. You should see two or three "knees" corresponding to cache-line boundary, page boundary, and large-region boundary.
Then repeat with THP enabled (madvise(MADV_HUGEPAGE) or /sys/kernel/mm/transparent_hugepage/enabled = always). Compare the TLB-miss curve.
Lab C: Major fault cost
On a machine with a known disk type:
- Drop caches. Run a program that
reads 1 GiB sequentially from a new file. Record time. - Run the same program
mmap-based. Record time, minor faults, major faults. - Repeat with the file already hot in the page cache.
Write one paragraph comparing the two approaches and explain the role of major faults in the cold case.
Lab D: Per-process fault observation
Pick a long-running process you use (browser, editor, database). Read /proc/$pid/status fields VmRSS, VmSize, and /proc/$pid/stat fields 10 (minflt) and 12 (majflt). Log them once per minute for 10 minutes. Plot the three values and narrate what the process is doing when each changes.
Evidence Check
This lab is complete only if you can:
- show a stride-vs-time plot with at least two visible knees
- name which counter confirmed each knee as a cache or TLB effect
- explain why enabling huge pages shifted the TLB knee but not the cache knee
- predict the fault pattern for a workload you have not measured yet and verify it within 20%