Demand Paging and Minor/Major Faults
What This Concept Is
Demand paging means the kernel does not bring a page into physical memory until the process actually touches it. Instead, the page's PTE is marked not-present. On first touch the MMU raises a page fault, the kernel handles it, installs a mapping, and restarts the faulting instruction.
Page faults come in two cost classes:
- Minor fault (soft fault): the page is already in memory; the PTE just needs to be installed. Common cases: first touch of a fresh anonymous mapping, a page already in the page cache for a file-backed mapping, or a CoW fault where a new page is allocated and quickly populated.
- Major fault (hard fault): the page is not in memory and has to be read from disk (swap or backing file). Cost: orders of magnitude higher, typically milliseconds on spinning disk, microseconds on SSD, versus microseconds-or-less for a minor fault.
On Linux you can see these counters per process: /proc/$pid/stat columns minflt and majflt, or via ps -o min_flt,maj_flt, or perf stat -e minor-faults,major-faults.
Why It Matters Here
Demand paging is the reason a program can mmap a 100 GiB file on a machine with 16 GiB of RAM and have it "work" (slowly). It is also the reason a newly-started process can report a huge virtual size but a tiny resident set size: the OS hasn't actually brought in the pages yet.
Understanding the fault types is how you diagnose memory problems:
- High minor-fault rate, moderate cost -> normal lazy allocation; nothing to fix.
- High major-fault rate -> you are thrashing against storage; either working set exceeds RAM or some workload is rereading pages from disk.
- High fault rate with no I/O -> look for CoW storms,
madvise(DONTNEED)patterns, or repeatedmmap/munmap.
Almost every "why is this process suddenly slow" memory story is really a story about the fault mix.
Concrete Example
A fresh malloc. A process calls malloc(64 MiB). glibc asks the kernel for an anonymous mmap. Nothing is touched; RSS is unchanged. The process later writes the first byte of each 4 KiB page: each write triggers a minor fault; the kernel hands out a zeroed page frame and installs the PTE. 16,384 minor faults, zero major faults.
A file-backed mmap. A process mmaps a 1 GiB file. read-style traversal of the file touches each page. If the file is already in the page cache, every fault is a minor fault (found-in-cache). If not, each is a major fault (read from disk).
Swap thrashing. A process has RSS 6 GiB on a 4 GiB machine. Each time it touches an unlucky page, the kernel evicts another, causing a write to swap (sometimes) and then a read from swap when that page is later retouched. Major faults pile up.
Identifying the pattern with perf:
perf stat -e minor-faults,major-faults,context-switches ./my_program
A minor-fault rate in the millions per second is normal during startup and fresh allocation. A major-fault rate in the thousands per second is nearly always trouble.
Common Confusion / Misconception
"A page fault means the program crashed." A page fault is a hardware event the kernel handles transparently. Only unhandled page faults (address not mapped at all, or permission violation) surface as SIGSEGV.
"Minor faults are free." They are cheap, not free. A minor fault still takes a context switch into the kernel, runs the fault handler, updates a PTE, and may flush part of the TLB. At millions of faults per second this adds up.
"Major fault rate equals swapping." File-backed major faults are just I/O: reading a page of a memory-mapped file from disk is a major fault, whether or not any swap is involved. Distinguish swap-in/swap-out counters from page-read/page-write counters.
How To Use It
When a service slows down, split the question into two:
- Is the fault mix abnormal? (
vmstat 1andperf stat -e major-faults,minor-faults.) - If major faults are elevated, what is being paged in? (Look at
iostat,/proc/meminfoswap counters,pidstat -r.)
A service that gets faster after a warm-up period is usually filling its page cache or hot anonymous pages; that is a minor-fault effect. A service that randomly hits latency spikes in production is often hitting major faults on cold pages.
Check Yourself
- Why does RSS typically grow as a function of what the program has touched, not what it has allocated?
- Give two causes of a minor fault that do not involve disk I/O at all.
- What distinguishes a major fault from a normal disk read?
- Why is
SIGSEGVnot an ordinary page fault?
Mini Drill or Application
- A program
mmaps 1 GiB anonymous, never touches it. Virtual size? RSS? Fault count expected if it runs to exit? - Same program now writes a byte in every page. Expected minor faults? Expected major faults?
- A program reads through a 10 GiB file via
mmap, sequentially, on a machine with 8 GiB RAM and a cold page cache. Sketch the major-fault trajectory over time. - A process has steadily-growing RSS while its allocations (per logs) are flat. What explains this without memory leaks?
- Write the
perf statcommand you would use to watch minor and major faults in real time.