Single-Level and Multi-Level Page Tables
What This Concept Is
A single-level (linear) page table is a contiguous array of PTEs, one per virtual page, indexed directly by VPN. It is the simplest possible data structure.
A multi-level page table is a radix tree of PTEs. The VPN is split into several index fields, one per level. At each level, an entry either points to the next-level table or marks that whole sub-range as invalid. Empty sub-trees consume no memory.
x86-64 today uses a 4-level page table (5-level is also supported). Each level is indexed by 9 bits, each table is 4 KiB (512 entries of 8 bytes), and the final level's entries are PTEs.
Why It Matters Here
A flat table is infeasible for modern address spaces. With 48-bit virtual addresses and 4 KiB pages, a single flat table has 2^36 = 64 G PTEs times 8 bytes = 512 GiB per process. Almost every entry would be invalid. The space would dominate everything.
Multi-level tables solve the space problem structurally: the kernel allocates page-table pages only for parts of the address space that are actually in use. A typical process uses a handful of MiB of page-table memory, not hundreds of GiB.
Understanding multi-level walks also explains TLB miss cost. A TLB miss on a 4-level table is up to four DRAM accesses before the translation is known, which is why a single TLB miss can cost hundreds of cycles.
Concrete Example
x86-64 with 48-bit virtual addresses and 4 KiB pages:
| Field | Bits | Count |
|---|---|---|
| PML4 index | 47-39 | 9 bits, 512 entries |
| PDPT index | 38-30 | 9 bits, 512 entries |
| PD index | 29-21 | 9 bits, 512 entries |
| PT index | 20-12 | 9 bits, 512 entries |
| page offset | 11-0 | 12 bits |
One PML4 entry covers 2^39 = 512 GiB, one PDPT entry covers 2^30 = 1 GiB, one PD entry covers 2^21 = 2 MiB, one PT entry covers 2^12 = 4 KiB.
A 2 MiB huge page is a PD entry with the "page size" bit set: the PD entry is terminal and its offset is 21 bits. A 1 GiB page does the same at the PDPT level with a 30-bit offset.
Memory cost example. A process that touches code at the bottom of the address space, heap in the middle, and stack near the top typically uses:
- 1 PML4 (4 KiB)
- 2-3 PDPTs (8-12 KiB)
- some PDs (tens of KiB)
- PT entries only for resident pages
Total page-table memory: often well under 1 MiB. Compare that to the 512 GiB a flat table would demand.
Common Confusion / Misconception
"Multi-level tables save time." They save space. They cost more time per translation than a hypothetical flat table, because the walk has more indirection. The TLB pays back the cost.
"A process has one big page table." In practice the MMU treats the top-level table (PML4) as the page table, and CR3 on x86-64 holds its physical address. Everything else is linked through it.
"Every level is always walked." No: huge pages short-circuit the walk at an earlier level. A 2 MiB page translates in 3 lookups instead of 4. A 1 GiB page translates in 2.
How To Use It
To walk a 4-level translation by hand:
- Take the virtual address, split into five fields using the table above.
- Read
CR3to find PML4's physical base. Take the PML4 index. Look up that entry. - The entry points to the PDPT. Take the PDPT index, look it up, move to the PD.
- PD index, look up, move to the PT.
- PT index, look up, produce the PFN.
- Concatenate PFN with the 12-bit offset to get the physical address.
If any entry along the way has the present bit clear, you have a page fault. If a terminal bit is set at PDPT or PD, you stopped early at a huge page.
Check Yourself
- Why does a flat page table scale with the size of the address space, while a multi-level table scales with the amount of it in use?
- On x86-64 with 48-bit virtual addresses and 4 KiB pages, how many 4 KiB page-table pages does the entire PML4 cost if every PML4 entry is valid?
- What is the worst-case memory-access cost (in DRAM reads) of a TLB miss with 4-level tables? With 2 MiB pages?
- Why is a 1 GiB page an attractive target for databases with a huge shared buffer pool?
Mini Drill or Application
- A 32-bit x86 non-PAE system uses a 2-level table with 10+10+12 bit split. How big is each table? What is the coverage of a single top-level entry?
- Translate
0x0000_7FFF_1234_5678on a 4-level, 4 KiB-page, x86-64 setup: give the 5 fields. - Same address under 2 MiB pages (3-level walk, 9+9+9+21 split above the offset). Give the 4 fields.
- Suppose a process touches exactly 3 pages: one instruction page near
0x0000_0000_0040_0000, one heap page near0x0000_5555_5555_0000, one stack page near0x0000_7FFF_FFFF_E000. Estimate the total page-table memory needed. - Explain why a TLB miss near the top of the address space and one near the bottom are likely to cost the same (same number of tables walked), even though the pages are far apart.