Skip to main content

Demand Paging and Minor/Major Faults

What This Concept Is

Demand paging means the kernel does not bring a page into physical memory until the process actually touches it. Instead, the page's PTE is marked not-present. On first touch the MMU raises a page fault, the kernel handles it, installs a mapping, and restarts the faulting instruction.

Page faults come in two cost classes:

  • Minor fault (soft fault): the page is already in memory; the PTE just needs to be installed. Common cases: first touch of a fresh anonymous mapping, a page already in the page cache for a file-backed mapping, or a CoW fault where a new page is allocated and quickly populated.
  • Major fault (hard fault): the page is not in memory and has to be read from disk (swap or backing file). Cost: orders of magnitude higher, typically milliseconds on spinning disk, microseconds on SSD, versus microseconds-or-less for a minor fault.

On Linux you can see these counters per process: /proc/$pid/stat columns minflt and majflt, or via ps -o min_flt,maj_flt, or perf stat -e minor-faults,major-faults.

Why It Matters Here

Demand paging is the reason a program can mmap a 100 GiB file on a machine with 16 GiB of RAM and have it "work" (slowly). It is also the reason a newly-started process can report a huge virtual size but a tiny resident set size: the OS hasn't actually brought in the pages yet.

Understanding the fault types is how you diagnose memory problems:

  • High minor-fault rate, moderate cost -> normal lazy allocation; nothing to fix.
  • High major-fault rate -> you are thrashing against storage; either working set exceeds RAM or some workload is rereading pages from disk.
  • High fault rate with no I/O -> look for CoW storms, madvise(DONTNEED) patterns, or repeated mmap/munmap.

Almost every "why is this process suddenly slow" memory story is really a story about the fault mix.

Concrete Example

A fresh malloc. A process calls malloc(64 MiB). glibc asks the kernel for an anonymous mmap. Nothing is touched; RSS is unchanged. The process later writes the first byte of each 4 KiB page: each write triggers a minor fault; the kernel hands out a zeroed page frame and installs the PTE. 16,384 minor faults, zero major faults.

A file-backed mmap. A process mmaps a 1 GiB file. read-style traversal of the file touches each page. If the file is already in the page cache, every fault is a minor fault (found-in-cache). If not, each is a major fault (read from disk).

Swap thrashing. A process has RSS 6 GiB on a 4 GiB machine. Each time it touches an unlucky page, the kernel evicts another, causing a write to swap (sometimes) and then a read from swap when that page is later retouched. Major faults pile up.

Identifying the pattern with perf:

perf stat -e minor-faults,major-faults,context-switches ./my_program

A minor-fault rate in the millions per second is normal during startup and fresh allocation. A major-fault rate in the thousands per second is nearly always trouble.

Common Confusion / Misconception

"A page fault means the program crashed." A page fault is a hardware event the kernel handles transparently. Only unhandled page faults (address not mapped at all, or permission violation) surface as SIGSEGV.

"Minor faults are free." They are cheap, not free. A minor fault still takes a context switch into the kernel, runs the fault handler, updates a PTE, and may flush part of the TLB. At millions of faults per second this adds up.

"Major fault rate equals swapping." File-backed major faults are just I/O: reading a page of a memory-mapped file from disk is a major fault, whether or not any swap is involved. Distinguish swap-in/swap-out counters from page-read/page-write counters.

How To Use It

When a service slows down, split the question into two:

  1. Is the fault mix abnormal? (vmstat 1 and perf stat -e major-faults,minor-faults.)
  2. If major faults are elevated, what is being paged in? (Look at iostat, /proc/meminfo swap counters, pidstat -r.)

A service that gets faster after a warm-up period is usually filling its page cache or hot anonymous pages; that is a minor-fault effect. A service that randomly hits latency spikes in production is often hitting major faults on cold pages.

Check Yourself

  1. Why does RSS typically grow as a function of what the program has touched, not what it has allocated?
  2. Give two causes of a minor fault that do not involve disk I/O at all.
  3. What distinguishes a major fault from a normal disk read?
  4. Why is SIGSEGV not an ordinary page fault?

Mini Drill or Application

  1. A program mmaps 1 GiB anonymous, never touches it. Virtual size? RSS? Fault count expected if it runs to exit?
  2. Same program now writes a byte in every page. Expected minor faults? Expected major faults?
  3. A program reads through a 10 GiB file via mmap, sequentially, on a machine with 8 GiB RAM and a cold page cache. Sketch the major-fault trajectory over time.
  4. A process has steadily-growing RSS while its allocations (per logs) are flat. What explains this without memory leaks?
  5. Write the perf stat command you would use to watch minor and major faults in real time.

Read This Only If Stuck