Module Quiz
Complete this quiz after finishing all concept and practice pages.
Current Module Questions
Question 1: Why Translate At All
Name three capabilities that virtual memory gives the OS, none of which is achievable when programs emit physical addresses directly.
Answer: Isolation between processes (each process's pointers cannot name another process's memory), relocation of physical pages without informing the process (the kernel can evict, migrate, compress, or share frames transparently), and over-commitment / abstraction (processes see a large, clean, private virtual address space even when physical memory is small or shared).
Question 2: Paging Math
A process uses 48-bit virtual addresses with 4 KiB pages. How many bits are the offset, how many are the VPN, and how many PTEs would a single flat page table hold per process?
Answer: Offset = 12 bits (4 KiB = 2^12). VPN = 36 bits. Flat table entries = 2^36 = ~6.87 * 10^10, times 8 bytes per PTE = 512 GiB per process. That is why multi-level tables exist.
Question 3: Page-Table Walk Walkthrough
Walk a 4-level (9+9+9+9+12) x86-64 translation for virtual address 0x0000_5555_5555_6000. Give the five fields.
Answer:
0x0000_5555_5555_6000in binary, focus on bits 47-0.- bits 47-39 (PML4) =
0_1010_1010=0xAA - bits 38-30 (PDPT) =
1_0101_0101=0x155 - bits 29-21 (PD) =
0_1010_1010=0xAA - bits 20-12 (PT) =
1_0101_0101=0x155 - bits 11-0 (offset) =
0x000
(Exact numbers depend on careful bit alignment; the key is that each level is a 9-bit index into a 512-entry table, and the offset is 12 bits.)
Question 4: Walk Cost
What is the worst-case memory-access cost of a TLB miss on a 4-level page-table walk, and how does a 2 MiB huge page change it?
Answer: Up to 4 DRAM reads (one per level) plus the final data read. With a 2 MiB huge page the walk stops at the PD level, so worst case is 3 DRAM reads. With a 1 GiB page it stops at PDPT: 2 DRAM reads.
Question 5: TLB Miss vs Page Fault
Explain the difference between a TLB miss and a page fault.
Answer: A TLB miss is a cache miss on the translation itself; the page table is present in memory and the MMU performs the walk with no OS involvement, costing a few DRAM accesses. A page fault means the PTE is absent or protection-violating; the MMU raises an exception, the kernel runs the fault handler, potentially reads a page from disk, installs a PTE, and returns. Misses are frequent and cheap; faults are rare and expensive.
Question 6: Minor vs Major Faults
Give one example of each, and describe their order-of-magnitude cost difference.
Answer: Minor fault example: first write to a fresh anonymous mapping (the kernel zeroes a frame and installs a PTE, no I/O). Major fault example: read of a page from a file-backed mapping that is not in the page cache (the kernel reads from disk). Minor faults cost microseconds; major faults cost tens to hundreds of microseconds on SSDs and milliseconds on spinning disks. Often 100x-10000x slower.
Question 7: Replacement Simulation
Reference string 1 2 3 4 1 2 5 1 2 3 4 5, 3 frames. Give the fault count under FIFO, LRU, and OPT.
Answer:
- FIFO: 9 faults. Order of residents after each reference, with evictions.
- LRU: 10 faults.
- OPT: 7 faults.
(Counts can shift slightly based on tie-breaking; the key is OPT < FIFO <= LRU on this string.)
Bonus: With 4 frames, FIFO yields 10 faults (Belady's anomaly: more frames, more FIFO faults on this string).
Question 8: Belady's Anomaly
State Belady's anomaly in one sentence.
Answer: For FIFO page replacement, increasing the number of frames can increase the fault count for some reference strings.
Question 9: Working Set and Thrashing
Define the working set W(t, tau) and explain why a system exhibits a sharp cliff rather than a slope when memory pressure increases.
Answer: W(t, tau) is the set of pages a process has referenced in the interval [t - tau, t]. The working-set size is |W(t, tau)|. While the sum of working sets fits in RAM, faults are rare and performance is smooth. Once the sum exceeds RAM, every eviction throws out a page another process is about to need; faults cascade. Because faults cost 100x-10000x more than hits, the transition is catastrophic rather than gradual.
Question 10: Kernel Allocators
Why does the kernel use both the buddy allocator and the slab allocator?
Answer: The buddy allocator hands out contiguous runs of 2^k page frames, well-matched to page-frame allocation and large DMA buffers, but it rounds requests up by powers of two, which is wasteful for small objects. The slab allocator runs on top of the buddy allocator: it takes page-sized chunks and carves them into per-type caches of equally-sized objects (inodes, sockets, task_struct, ...), with O(1) allocation and free and near-zero internal fragmentation within a cache. Neither is sufficient alone.
Question 11: User Allocator Choice
A long-running multithreaded service on 16 cores shows RSS of 24 GiB with 4 GiB live objects and 40% CPU time in __lll_lock_wait. Diagnose and propose a fix.
Answer: __lll_lock_wait inside libc indicates contention on the glibc ptmalloc arena locks. Combined with a large RSS-vs-live gap, this is classic ptmalloc arena accumulation and fragmentation. Fix candidates: set MALLOC_ARENA_MAX=2 or 4 (reducing arena count trades some contention for lower RSS), or switch to jemalloc or tcmalloc via LD_PRELOAD. jemalloc is usually the best default for long-lived services because of its aggressive page-release behavior.
Question 12: Fragmentation Types
Give one concrete example of internal fragmentation and one of external fragmentation.
Answer:
- Internal: a 40-byte request served from a 48-byte size class wastes 8 bytes (~17%) per allocation; the slack is inside an allocated block.
- External: a heap has free chunks of 1, 2, 1, 3, 1, 2 KiB (10 KiB total) but cannot satisfy a 5 KiB request. Or at the buddy-allocator level: plenty of order-0 free pages but zero at order 9, so a 2 MiB contiguous request fails.
Question 13: Copy-on-Write
A Redis server with 20 GiB of data forks a child to do a background save. What is each process's RSS immediately after fork? What does writing to a page cost?
Answer: Immediately after fork, both processes show 20 GiB RSS, but almost all pages are shared read-only across the two with a refcount of 2. Each writable page-table entry in both processes is marked read-only. On a write to a page, the MMU raises a protection fault; the kernel allocates a new frame, copies the page, installs it as read-write in the writing process's PTE, and decrements the refcount. Only the pages written by either process diverge; the rest stay shared until exec or exit.
Question 14: mmap Flavors
You need: (a) a 1 GiB zeroed scratch buffer, (b) shared memory between a parent and a child, (c) to read a 10 GiB file where you want the kernel to cache hot parts, (d) a persistent on-disk data structure that you want to manipulate in-memory with writes visible to other processes. Choose the right mmap flavor for each.
Answer:
- (a)
MAP_PRIVATE | MAP_ANONYMOUS(zero-filled anonymous private). - (b)
MAP_SHARED | MAP_ANONYMOUS(ormemfd_create+MAP_SHARED) so parent and child see each other's writes. - (c)
MAP_PRIVATEover the file read-only, using the page cache via major faults on cold pages. - (d)
MAP_SHAREDover the file read-write; writes propagate to the file and to other mappers.
Question 15: Huge Pages Trade-off
A workload's perf stat shows a very high dTLB-load-misses rate. You turn on transparent huge pages globally. Performance improves on some workloads and degrades on others. Explain one case each.
Answer:
- Improves: an in-memory database reads randomly across a 10 GiB buffer pool. With 4 KiB pages the working set vastly exceeds TLB reach; with 2 MiB pages TLB reach grows 512x and miss rate collapses.
- Degrades: a
fork-heavy service (e.g., Redis BGSAVE) triggers CoW on 2 MiB pages. Each divergence copies 2 MiB instead of 4 KiB (512x amplification). Also,khugepagedcompaction sweeps can stall latency-sensitive threads. This is why some databases recommend disabling THP.
Interleaved Review Questions
Prior Module Question 1 (S5 Module 1: Processes & Scheduling)
Why does a context switch on x86-64 not require flushing the TLB on modern kernels?
Answer: Because the CPU supports PCID / ASID tagging: each TLB entry carries an address-space identifier, so the TLB can hold translations for many processes simultaneously. A context switch changes CR3 (and the current PCID); the TLB does not need to be flushed, only filtered by the new PCID on probes.
Prior Module Question 2 (S4 Systems Programming: Process Memory Layout)
Name the four traditional segments of a Unix process address space and whether each grows.
Answer: Text (code, fixed size, typically read-only), data (initialized globals, fixed size), BSS (uninitialized globals, fixed size and zero-filled), heap (grows up via brk/sbrk or mmap), stack (grows down toward the heap). Modern layouts also include loaded shared libraries and mmaped regions in the middle.
Prior Module Question 3 (S2 Module 1: Algorithm Analysis)
The buddy allocator's "split down to order k, then coalesce on free" sequence: what is the asymptotic cost per allocation, and why?
Answer: O(log N) where N is the maximum supported order, because the walk up and down the order free lists is bounded by the number of orders. Coalescing is also O(log N) because each coalescing step doubles the block size, so at most log N steps occur.
Prior Module Question 4 (S1 Module 1: Proofs / Induction)
Prove by induction that a buddy allocator that serves requests of exactly power-of-two sizes cannot produce external fragmentation (all free memory is always coalescible into usable blocks for some size class).
Answer: Base case: the allocator starts with one block of order K (the maximum). Inductive step: assume at some point every maximal free run of memory is aligned on its own size boundary and can be coalesced with its buddy if the buddy is also free. Any allocation of a power-of-two size splits blocks but preserves the alignment invariant. Any free reunites buddies if both free. Therefore the invariant holds. In particular, the allocator never holds unusable gaps smaller than its smallest size class.
Prior Module Question 5 (S1 Module 3: Probability)
If each of 100 threads independently hits one of 8 glibc arenas uniformly at random, what is the probability that all eight arenas are contended (each used by at least one thread)?
Answer: Occupancy problem. P(all 8 used) = 1 - P(at least one arena unused). Using inclusion-exclusion, or the standard surjection formula:
P = sum_{k=0}^{8} (-1)^k C(8,k) ((8-k)/8)^100 = 8! * S(100, 8) / 8^100
where S(n,k) is the Stirling number of the second kind. Numerically this is essentially 1 (with 100 threads over 8 arenas, all arenas being hit is almost certain).
Self-Assessment and Remediation
Mastery Level (90-100% correct):
- Ready to advance with confidence. You can walk a translation and diagnose a real memory problem.
Proficient Level (75-89% correct):
- Review only the missed concept pages and redo two problems of each missed type. Prioritize any questions about TLB cost and replacement policies.
Developing Level (60-74% correct):
- Rework Cluster 2 (Page Tables and the TLB) and Cluster 3 (Page Replacement and Swapping). Redo Practice 1 and Practice 2. The gap is almost always page-table walk math or replacement-policy simulation.
Insufficient Level (<60% correct):
- Return to the concept sequence from the start. The likely issue is that you are treating virtual memory as abstract rather than as a hardware + kernel system. Re-read Cluster 1 and redo the bit-split and page-table walk katas before advancing.