Module 2: Memory Management & Virtual Memory
Primary text: Operating Systems: Three Easy Pieces (OSTEP), Part I chapters on virtualization of memory
Selective support: Operating System Concepts chapters 9-10 for alternate exposition, the Linux kernel memory-management admin guide, and LWN articles on allocators and huge pages
This guide is the primary teacher. You do not need to read OSTEP front-to-back to complete this module. You do need to become operationally strong at reading a page-table walk, predicting TLB and page-fault behavior, and reasoning about allocators in both the kernel and in userland.
Scope of This Module
This module is not a tour of memory hardware trivia. It is where virtual memory becomes a system you can reason about when something is slow, faulting, or leaking.
What it covers in depth:
- physical versus virtual addresses and why translation exists at all
- base-and-bounds and segmentation as the first attempts, and why they lost
- paging as the modern default: page size, page number, page offset, page-table entry
- single-level, multi-level, and inverted page tables with their space/time trade-offs
- the TLB as the caching fix that makes paging affordable, plus miss costs
- demand paging, minor versus major faults, and the page-fault control flow
- page-replacement policies (OPT, FIFO, LRU, clock, second-chance) and their failure modes
- thrashing and the working-set intuition for "when a machine falls off a cliff"
- kernel allocators: buddy system and slab, and why the kernel needs both
- userland allocators:
dlmalloc,ptmalloc,jemalloc,tcmallocand the arenas / thread-cache / size-class ideas - internal versus external fragmentation, and how allocators trade one for the other
- copy-on-write and what makes
fork+execcheap mmap, anonymous mappings, shared mappings, and file-backed mappings- huge pages, NUMA, and the modern wrinkles that change cost models
What it deliberately does not try to finish here:
- concurrency primitives inside the allocator (Module 3 covers locks and lock-free structures)
- the full page-cache and writeback story (Module 4 covers file systems and I/O)
- hardware microarchitecture (cache hierarchy, prefetch) beyond how it touches TLB and page size
- security topics like ASLR, W^X, KASLR, and side channels beyond a passing mention
This is an in-depth systems module. If you can quote definitions but cannot walk a virtual address to its page frame on paper, you are not done.
Before You Start
Answer these closed-book before starting the main path:
- Why does the OS keep a process from using the raw physical address it wants?
- Given a 48-bit virtual address and 4 KiB pages, how many bits are the offset and how many are the page number?
- What is a TLB miss, and why is it not the same thing as a page fault?
- Why is LRU a theoretical benchmark rather than a policy a real kernel actually implements?
- What does
forkcopy, and what does it not copy, on a modern Linux system?
Diagnostic Interpretation
4-5 solid answers
- You are ready for the full path.
2-3 solid answers
- Continue, but expect extra time in Cluster 2 (page tables and TLB) and Cluster 4 (allocators).
0-1 solid answers
- Revisit Module 1 of this semester (processes and scheduling) and the Semester 4 systems-programming material on pointers and the process address space before advancing. You need those before paging becomes readable.
What This Module Is For
Virtual memory is the abstraction that makes every other part of a real system tractable. Later work repeatedly asks questions like:
- why did my service's p99 latency double after a memory upgrade?
- why does this program spend 30% of its time in the kernel on TLB shootdowns?
- why does
mallocreturn quickly but my resident-set size climb for an hour? - why does
forkon a 40 GiB process take milliseconds? - why does
mmapof a large file look free until the first read of each page?
This module builds the memory reasoning needed for:
- debugging page-fault storms, swap death, and NUMA imbalance
- reading
perf,vmstat,/proc/meminfo,pmap, andsmemoutput without guessing - choosing and tuning allocators for latency-sensitive or memory-sensitive workloads
- understanding GC, JIT, and runtime-level memory use through the lens of the virtual-memory subsystem underneath
- designing data-intensive systems where memory layout, not CPU, is the bottleneck
You are learning to reason about memory without handwaving.
Concept Map
How To Use This Module
Work in order. The later clusters only make sense if the earlier clusters are stable.
Cluster 1: Address Spaces and Translation
| Order | Concept | Type | Focus |
|---|---|---|---|
| 1 | Physical vs Virtual Addresses: Why We Need Translation | PRIMARY | Why the OS refuses to let programs use raw physical addresses |
| 2 | Base/Bounds and Segmentation: The First Answer | SUPPORTING | The earliest translation hardware and why it lost to paging |
| 3 | Paging: The Modern Answer | PRIMARY | Fixed-size pages, page numbers, offsets, and page-table entries |
Cluster mastery check: Can you take a 48-bit virtual address with 4 KiB pages, split it into VPN and offset, and explain what the MMU does with each half?
Cluster 2: Page Tables and the TLB
| Order | Concept | Type | Focus |
|---|---|---|---|
| 4 | Single-Level and Multi-Level Page Tables | PRIMARY | Why flat tables are infeasible, how radix trees fix it |
| 5 | Inverted Page Tables and Their Trade-offs | SUPPORTING | When one table per frame beats one table per process |
| 6 | The TLB and Its Caching Role | PRIMARY | How a 64-entry cache decides whether paging is tolerable |
Cluster mastery check: Can you walk a 4-level page-table translation on paper and explain exactly what a TLB miss costs relative to a TLB hit?
Cluster 3: Page Replacement and Swapping
| Order | Concept | Type | Focus |
|---|---|---|---|
| 7 | Demand Paging and Minor/Major Faults | PRIMARY | Pages appear on demand, and faults come in two costs |
| 8 | Replacement Policies: FIFO, LRU, Approximations | PRIMARY | OPT as a benchmark, why real kernels use clock and variants |
| 9 | Thrashing and Working-Set Theory | PRIMARY | The working set, and why swap death happens suddenly |
Cluster mastery check: Given a reference string, can you simulate OPT, FIFO, and LRU, and explain Belady's anomaly without looking it up?
Cluster 4: Allocators and Fragmentation
| Order | Concept | Type | Focus |
|---|---|---|---|
| 10 | Kernel Memory Allocators: Buddy System, Slab | PRIMARY | How the kernel gets page frames and small objects without fragmenting itself |
| 11 | Userland Allocators: dlmalloc, ptmalloc, jemalloc, tcmalloc | PRIMARY | Arenas, size classes, thread caches, and why choice matters |
| 12 | Internal vs External Fragmentation | SUPPORTING | Where the wasted bytes go, and how allocators trade one for the other |
Cluster mastery check: Can you explain, in one paragraph each, what jemalloc does that ptmalloc does not, and why the kernel uses both buddy and slab rather than one or the other?
Cluster 5: Practical Virtual Memory Effects
| Order | Concept | Type | Focus |
|---|---|---|---|
| 13 | Copy-on-Write and fork's Efficiency | PRIMARY | Why fork on a 40 GiB process takes milliseconds |
| 14 | mmap, Anonymous Mappings, Shared Mappings | PRIMARY | The unified interface under allocation, shared memory, and file I/O |
| 15 | Large Pages, NUMA, and Modern Wrinkles | SUPPORTING | Huge pages, THP, NUMA policy, and why new hardware keeps moving the costs |
Cluster mastery check: Can you describe one measurable real-system effect of each of these: CoW after fork, a file-backed mmap first-read, and a workload that benefits from 2 MiB huge pages?
Then work these practice pages:
| Order | Practice path | Focus |
|---|---|---|
| 1 | Page Fault and TLB Measurement Lab | perf stat, minor vs major faults, TLB miss cost under varying stride |
| 2 | Page Replacement Simulator Workshop | Build and compare OPT, FIFO, LRU, and clock on real reference strings |
| 3 | Allocator Comparison Clinic | Benchmark glibc, jemalloc, tcmalloc on contention and RSS |
| 4 | Code Katas | Short timed drills on page-table math, mmap, and allocation patterns |
Use Module Quiz after the concept and practice path. Use Reference and Selective Reading and Learning Resources only for targeted reinforcement.
Learning Objectives
By the end of this module you should be able to:
- Explain why the OS refuses to expose physical memory directly and state the three things translation buys us.
- Walk a multi-level page-table lookup for a 48-bit virtual address with 4 KiB pages, naming each index and entry.
- Describe what a TLB hit, a TLB miss, a minor page fault, and a major page fault each cost, in order of magnitude.
- Simulate FIFO, LRU, and clock on a reference string and explain Belady's anomaly.
- Define the working set and explain why thrashing appears as a cliff rather than a slope.
- Explain the buddy system and slab allocator, and why the kernel uses both.
- Compare
ptmalloc,jemalloc, andtcmallocon contention, fragmentation, and RSS bloat behavior. - Distinguish internal and external fragmentation with a worked example of each.
- Explain copy-on-write with a concrete
fork/exectimeline. - Choose between
malloc, anonymousmmap, sharedmmap, and file-backedmmapfor a given workload and justify.
Outputs
- a page-table notebook with at least 8 walked translations (different levels, different page sizes) and a 1 GiB / 2 MiB / 4 KiB comparison table
- one TLB and page-fault measurement report using
perf staton a stride-varying program, with the cost curve and a one-page interpretation - one page-replacement simulator in code, supporting OPT, FIFO, LRU, and clock, validated against at least three reference strings from OSTEP Chapter 22
- one allocator-comparison report comparing
glibcdefault,jemalloc, andtcmallocon a multithreaded micro-benchmark, with RSS, throughput, and contention notes - one
mmapplaybook: at least four worked scenarios (anonymous, file-backed read, file-backedMAP_SHARED, huge-page-backed) with expected/proc/$pid/mapsoutput - one CoW timeline from
forkto divergence, with/proc/$pid/smapsreadings before and after the first write - one mistake log naming at least 12 recurring errors such as
confused TLB miss with page fault,assumed LRU in Linux,forgot anonymous vs file-backed,blamed the allocator for a page-fault problem, orignored NUMA locality - one short memo mapping Module 2 reasoning into the concurrency (Module 3), file-systems (Module 4), and later distributed-systems modules
Completion Standard
You have completed Module 2 when all of these are true:
- you can translate a virtual address by hand using a multi-level page table
- you can state where the MMU, the TLB, the page-fault handler, and the allocator each sit in the translation and allocation path
- you can simulate replacement policies and explain why OPT is only a yardstick
- you can tell when a performance problem is really a TLB problem, a page-fault problem, an allocator problem, or a NUMA problem, using evidence
- you can explain copy-on-write and
mmapin plain language and predict their cost profile - you can pick between allocators for a real workload and defend the choice with at least one measurement
- you can read
/proc/meminfo,/proc/$pid/maps, and/proc/$pid/smapswithout looking things up
If you can quote definitions but cannot debug a memory problem with perf and /proc output in front of you, the module is not complete.
Reading Policy
- Concept pages are the main path.
- Local book chunks are selective reinforcement, not a second syllabus.
Read only if stuckmeans try the concept page, self-check, and drill first.Optional deep divemeans additional nuance or exercise volume, not required progression.- External references are validated but optional; treat them as alternate exposition when a concept page does not land.
Suggested Weekly Flow
| Day | Work |
|---|---|
| 1 | Concepts 1-3, draw an address-space diagram and do one paging math drill |
| 2 | Concepts 4-6, walk two multi-level translations by hand and estimate TLB miss cost |
| 3 | Concepts 7-9, run Practice 1 (fault and TLB measurement) to halfway |
| 4 | Concepts 10-12 and Practice 2 (page-replacement simulator) |
| 5 | Concepts 13-15 and Practice 3 (allocator comparison) halfway |
| 6 | Finish Practice 1-3, start Practice 4 katas |
| 7 | Practice 4 katas, quiz, mistake-log cleanup |
Reference
If you need exact links into the local chunked books, use Reference and Selective Reading.
The Operating System tutorial (Phil Opp's "Writing an OS in Rust") covers paging, frame allocation, and kernel heap in directly-runnable form. This is the deepest possible engagement with this module. See Build Your Own X overview.
Rich Learning Pages
Worked Examples | Guided Labs | Case Studies | Mistake Clinic | Reading Guide | Capstone Thread