Skip to main content

Module 2: Memory Management & Virtual Memory

Primary text: Operating Systems: Three Easy Pieces (OSTEP), Part I chapters on virtualization of memory
Selective support: Operating System Concepts chapters 9-10 for alternate exposition, the Linux kernel memory-management admin guide, and LWN articles on allocators and huge pages

This guide is the primary teacher. You do not need to read OSTEP front-to-back to complete this module. You do need to become operationally strong at reading a page-table walk, predicting TLB and page-fault behavior, and reasoning about allocators in both the kernel and in userland.


Scope of This Module

This module is not a tour of memory hardware trivia. It is where virtual memory becomes a system you can reason about when something is slow, faulting, or leaking.

What it covers in depth:

  • physical versus virtual addresses and why translation exists at all
  • base-and-bounds and segmentation as the first attempts, and why they lost
  • paging as the modern default: page size, page number, page offset, page-table entry
  • single-level, multi-level, and inverted page tables with their space/time trade-offs
  • the TLB as the caching fix that makes paging affordable, plus miss costs
  • demand paging, minor versus major faults, and the page-fault control flow
  • page-replacement policies (OPT, FIFO, LRU, clock, second-chance) and their failure modes
  • thrashing and the working-set intuition for "when a machine falls off a cliff"
  • kernel allocators: buddy system and slab, and why the kernel needs both
  • userland allocators: dlmalloc, ptmalloc, jemalloc, tcmalloc and the arenas / thread-cache / size-class ideas
  • internal versus external fragmentation, and how allocators trade one for the other
  • copy-on-write and what makes fork + exec cheap
  • mmap, anonymous mappings, shared mappings, and file-backed mappings
  • huge pages, NUMA, and the modern wrinkles that change cost models

What it deliberately does not try to finish here:

  • concurrency primitives inside the allocator (Module 3 covers locks and lock-free structures)
  • the full page-cache and writeback story (Module 4 covers file systems and I/O)
  • hardware microarchitecture (cache hierarchy, prefetch) beyond how it touches TLB and page size
  • security topics like ASLR, W^X, KASLR, and side channels beyond a passing mention

This is an in-depth systems module. If you can quote definitions but cannot walk a virtual address to its page frame on paper, you are not done.


Before You Start

Answer these closed-book before starting the main path:

  1. Why does the OS keep a process from using the raw physical address it wants?
  2. Given a 48-bit virtual address and 4 KiB pages, how many bits are the offset and how many are the page number?
  3. What is a TLB miss, and why is it not the same thing as a page fault?
  4. Why is LRU a theoretical benchmark rather than a policy a real kernel actually implements?
  5. What does fork copy, and what does it not copy, on a modern Linux system?

Diagnostic Interpretation

4-5 solid answers

  • You are ready for the full path.

2-3 solid answers

  • Continue, but expect extra time in Cluster 2 (page tables and TLB) and Cluster 4 (allocators).

0-1 solid answers

  • Revisit Module 1 of this semester (processes and scheduling) and the Semester 4 systems-programming material on pointers and the process address space before advancing. You need those before paging becomes readable.

What This Module Is For

Virtual memory is the abstraction that makes every other part of a real system tractable. Later work repeatedly asks questions like:

  • why did my service's p99 latency double after a memory upgrade?
  • why does this program spend 30% of its time in the kernel on TLB shootdowns?
  • why does malloc return quickly but my resident-set size climb for an hour?
  • why does fork on a 40 GiB process take milliseconds?
  • why does mmap of a large file look free until the first read of each page?

This module builds the memory reasoning needed for:

  • debugging page-fault storms, swap death, and NUMA imbalance
  • reading perf, vmstat, /proc/meminfo, pmap, and smem output without guessing
  • choosing and tuning allocators for latency-sensitive or memory-sensitive workloads
  • understanding GC, JIT, and runtime-level memory use through the lens of the virtual-memory subsystem underneath
  • designing data-intensive systems where memory layout, not CPU, is the bottleneck

You are learning to reason about memory without handwaving.


Concept Map


How To Use This Module

Work in order. The later clusters only make sense if the earlier clusters are stable.

Cluster 1: Address Spaces and Translation

OrderConceptTypeFocus
1Physical vs Virtual Addresses: Why We Need TranslationPRIMARYWhy the OS refuses to let programs use raw physical addresses
2Base/Bounds and Segmentation: The First AnswerSUPPORTINGThe earliest translation hardware and why it lost to paging
3Paging: The Modern AnswerPRIMARYFixed-size pages, page numbers, offsets, and page-table entries

Cluster mastery check: Can you take a 48-bit virtual address with 4 KiB pages, split it into VPN and offset, and explain what the MMU does with each half?

Cluster 2: Page Tables and the TLB

OrderConceptTypeFocus
4Single-Level and Multi-Level Page TablesPRIMARYWhy flat tables are infeasible, how radix trees fix it
5Inverted Page Tables and Their Trade-offsSUPPORTINGWhen one table per frame beats one table per process
6The TLB and Its Caching RolePRIMARYHow a 64-entry cache decides whether paging is tolerable

Cluster mastery check: Can you walk a 4-level page-table translation on paper and explain exactly what a TLB miss costs relative to a TLB hit?

Cluster 3: Page Replacement and Swapping

OrderConceptTypeFocus
7Demand Paging and Minor/Major FaultsPRIMARYPages appear on demand, and faults come in two costs
8Replacement Policies: FIFO, LRU, ApproximationsPRIMARYOPT as a benchmark, why real kernels use clock and variants
9Thrashing and Working-Set TheoryPRIMARYThe working set, and why swap death happens suddenly

Cluster mastery check: Given a reference string, can you simulate OPT, FIFO, and LRU, and explain Belady's anomaly without looking it up?

Cluster 4: Allocators and Fragmentation

OrderConceptTypeFocus
10Kernel Memory Allocators: Buddy System, SlabPRIMARYHow the kernel gets page frames and small objects without fragmenting itself
11Userland Allocators: dlmalloc, ptmalloc, jemalloc, tcmallocPRIMARYArenas, size classes, thread caches, and why choice matters
12Internal vs External FragmentationSUPPORTINGWhere the wasted bytes go, and how allocators trade one for the other

Cluster mastery check: Can you explain, in one paragraph each, what jemalloc does that ptmalloc does not, and why the kernel uses both buddy and slab rather than one or the other?

Cluster 5: Practical Virtual Memory Effects

OrderConceptTypeFocus
13Copy-on-Write and fork's EfficiencyPRIMARYWhy fork on a 40 GiB process takes milliseconds
14mmap, Anonymous Mappings, Shared MappingsPRIMARYThe unified interface under allocation, shared memory, and file I/O
15Large Pages, NUMA, and Modern WrinklesSUPPORTINGHuge pages, THP, NUMA policy, and why new hardware keeps moving the costs

Cluster mastery check: Can you describe one measurable real-system effect of each of these: CoW after fork, a file-backed mmap first-read, and a workload that benefits from 2 MiB huge pages?

Then work these practice pages:

OrderPractice pathFocus
1Page Fault and TLB Measurement Labperf stat, minor vs major faults, TLB miss cost under varying stride
2Page Replacement Simulator WorkshopBuild and compare OPT, FIFO, LRU, and clock on real reference strings
3Allocator Comparison ClinicBenchmark glibc, jemalloc, tcmalloc on contention and RSS
4Code KatasShort timed drills on page-table math, mmap, and allocation patterns

Use Module Quiz after the concept and practice path. Use Reference and Selective Reading and Learning Resources only for targeted reinforcement.


Learning Objectives

By the end of this module you should be able to:

  1. Explain why the OS refuses to expose physical memory directly and state the three things translation buys us.
  2. Walk a multi-level page-table lookup for a 48-bit virtual address with 4 KiB pages, naming each index and entry.
  3. Describe what a TLB hit, a TLB miss, a minor page fault, and a major page fault each cost, in order of magnitude.
  4. Simulate FIFO, LRU, and clock on a reference string and explain Belady's anomaly.
  5. Define the working set and explain why thrashing appears as a cliff rather than a slope.
  6. Explain the buddy system and slab allocator, and why the kernel uses both.
  7. Compare ptmalloc, jemalloc, and tcmalloc on contention, fragmentation, and RSS bloat behavior.
  8. Distinguish internal and external fragmentation with a worked example of each.
  9. Explain copy-on-write with a concrete fork/exec timeline.
  10. Choose between malloc, anonymous mmap, shared mmap, and file-backed mmap for a given workload and justify.

Outputs

  • a page-table notebook with at least 8 walked translations (different levels, different page sizes) and a 1 GiB / 2 MiB / 4 KiB comparison table
  • one TLB and page-fault measurement report using perf stat on a stride-varying program, with the cost curve and a one-page interpretation
  • one page-replacement simulator in code, supporting OPT, FIFO, LRU, and clock, validated against at least three reference strings from OSTEP Chapter 22
  • one allocator-comparison report comparing glibc default, jemalloc, and tcmalloc on a multithreaded micro-benchmark, with RSS, throughput, and contention notes
  • one mmap playbook: at least four worked scenarios (anonymous, file-backed read, file-backed MAP_SHARED, huge-page-backed) with expected /proc/$pid/maps output
  • one CoW timeline from fork to divergence, with /proc/$pid/smaps readings before and after the first write
  • one mistake log naming at least 12 recurring errors such as confused TLB miss with page fault, assumed LRU in Linux, forgot anonymous vs file-backed, blamed the allocator for a page-fault problem, or ignored NUMA locality
  • one short memo mapping Module 2 reasoning into the concurrency (Module 3), file-systems (Module 4), and later distributed-systems modules

Completion Standard

You have completed Module 2 when all of these are true:

  • you can translate a virtual address by hand using a multi-level page table
  • you can state where the MMU, the TLB, the page-fault handler, and the allocator each sit in the translation and allocation path
  • you can simulate replacement policies and explain why OPT is only a yardstick
  • you can tell when a performance problem is really a TLB problem, a page-fault problem, an allocator problem, or a NUMA problem, using evidence
  • you can explain copy-on-write and mmap in plain language and predict their cost profile
  • you can pick between allocators for a real workload and defend the choice with at least one measurement
  • you can read /proc/meminfo, /proc/$pid/maps, and /proc/$pid/smaps without looking things up

If you can quote definitions but cannot debug a memory problem with perf and /proc output in front of you, the module is not complete.


Reading Policy

  • Concept pages are the main path.
  • Local book chunks are selective reinforcement, not a second syllabus.
  • Read only if stuck means try the concept page, self-check, and drill first.
  • Optional deep dive means additional nuance or exercise volume, not required progression.
  • External references are validated but optional; treat them as alternate exposition when a concept page does not land.

Suggested Weekly Flow

DayWork
1Concepts 1-3, draw an address-space diagram and do one paging math drill
2Concepts 4-6, walk two multi-level translations by hand and estimate TLB miss cost
3Concepts 7-9, run Practice 1 (fault and TLB measurement) to halfway
4Concepts 10-12 and Practice 2 (page-replacement simulator)
5Concepts 13-15 and Practice 3 (allocator comparison) halfway
6Finish Practice 1-3, start Practice 4 katas
7Practice 4 katas, quiz, mistake-log cleanup

Reference

If you need exact links into the local chunked books, use Reference and Selective Reading.


Build Your Own X — elective

The Operating System tutorial (Phil Opp's "Writing an OS in Rust") covers paging, frame allocation, and kernel heap in directly-runnable form. This is the deepest possible engagement with this module. See Build Your Own X overview.

Rich Learning Pages

Worked Examples | Guided Labs | Case Studies | Mistake Clinic | Reading Guide | Capstone Thread