Page Fault and TLB Measurement Lab

A measurement-first lab. Every claim in the concept pages about paging cost should become a curve you have seen with your own eyes.

Retrieval Prompts

State from memory the difference between a minor and a major page fault.
State from memory what a TLB miss is and what it costs relative to a TLB hit.
Explain why sequential traversal has different cache and TLB behavior than strided traversal.
Write the perf stat command to measure minor-faults, major-faults, dTLB-load-misses, and dTLB-loads in one shot.
Explain why getrusage gives you fault counts but not TLB miss counts.

Compare and Distinguish

Separate these pairs clearly:

TLB miss versus page fault
minor fault versus major fault
perf stat counters versus /proc/$pid/stat fields
working-set size versus resident set size
anonymous vs file-backed major faults

Common Mistake Check

Identify the error in each statement:

"The program is slow because of page faults." (when perf stat shows 20 faults total)
"Linux does not use LRU so I do not need to worry about replacement policies."
"A huge page eliminates TLB misses."
"perf stat shows 1 billion TLB loads, so the program has 1 billion TLB misses."
"Adding RAM to the machine always reduces minor-fault count."

Mini Application

Lab A: Minor vs Major faults on fresh allocation

Write a C or Rust program that:

mmaps 1 GiB anonymous private, then exits. Record perf stat -e minor-faults,major-faults output.
mmaps 1 GiB anonymous private and writes one byte per 4 KiB page, then exits. Same measurement.
mmaps a 1 GiB file private read-only, reads every byte, then exits. Same measurement. Repeat after flushing the page cache (echo 3 > /proc/sys/vm/drop_caches with root, or reboot).

Record minor, major, runtime, and RSS for each. Explain every order-of-magnitude difference in writing.

Lab B: The stride-knee experiment for TLB miss cost

Allocate an array of N 8-byte entries, large enough that it does not fit in L2 cache and preferably exceeds L2 TLB reach at 4 KiB pages. Write a loop:

for (size_t i = 0; i < iterations; i++) {
    sum += arr[(i * stride) % N];
}

Run for stride = 1, 8, 64, 512, 4096, 8192, 65536, 524288 (8-byte units). Measure time per access and:

dTLB-load-misses
L1-dcache-load-misses
LLC-load-misses

Plot or tabulate time per access as a function of stride. You should see two or three "knees" corresponding to cache-line boundary, page boundary, and large-region boundary.

Then repeat with THP enabled (madvise(MADV_HUGEPAGE) or /sys/kernel/mm/transparent_hugepage/enabled = always). Compare the TLB-miss curve.

Lab C: Major fault cost

On a machine with a known disk type:

Drop caches. Run a program that reads 1 GiB sequentially from a new file. Record time.
Run the same program mmap-based. Record time, minor faults, major faults.
Repeat with the file already hot in the page cache.

Write one paragraph comparing the two approaches and explain the role of major faults in the cold case.

Lab D: Per-process fault observation

Pick a long-running process you use (browser, editor, database). Read /proc/$pid/status fields VmRSS, VmSize, and /proc/$pid/stat fields 10 (minflt) and 12 (majflt). Log them once per minute for 10 minutes. Plot the three values and narrate what the process is doing when each changes.

Evidence Check

This lab is complete only if you can:

show a stride-vs-time plot with at least two visible knees
name which counter confirmed each knee as a cache or TLB effect
explain why enabling huge pages shifted the TLB knee but not the cache knee
predict the fault pattern for a workload you have not measured yet and verify it within 20%

Retrieval Prompts​

Compare and Distinguish​

Common Mistake Check​

Mini Application​

Lab A: Minor vs Major faults on fresh allocation​

Lab B: The stride-knee experiment for TLB miss cost​

Lab C: Major fault cost​

Lab D: Per-process fault observation​

Evidence Check​

Read This Only If Stuck​