Physical vs Virtual Addresses: Why We Need Translation
What This Concept Is
Every memory access in a modern user program uses a virtual address. That address does not directly name a byte of DRAM. It is translated, on every access, into a physical address by the memory-management unit (MMU) using tables the OS maintains.
Translation exists so the OS can provide three things that a raw physical-memory model cannot:
- Isolation. One process cannot even name memory that belongs to another.
- Relocation. The kernel can move a process's pages around without the process knowing.
- Over-commitment and abstraction. A process sees a large, clean, private address space even if physical memory is small, fragmented, or shared.
The OS never lets unprivileged code emit a physical address to the bus. Every load and store is interpreted.
Why It Matters Here
Almost every later idea in this module is a specialization of translation:
- page tables are the data structure that makes translation work
- the TLB is the cache that makes translation fast
- page faults are the escape hatch when translation cannot complete
mmapand copy-on-write are features the OS can cheaply offer because translation is already there
If you think of memory as a flat array the program talks to directly, the rest of the module will not make sense. You are looking at an abstraction with hardware cost and software control.
Concrete Example
On a 64-bit Linux x86-64 process, a printf call might load from virtual address 0x00007f1b 4a23 c080. The MMU splits that into a page number and an offset, walks the per-process page table in DRAM, finds a page frame number such as 0x1 a09c, and emits the physical address 0x1 a09c 080 on the bus. The process never sees the physical address.
Two processes can both hold a char * with the value 0x7ffd c000 0000 and point at completely different physical bytes. A debugger printing pointers in one process tells you nothing about the other.
When the kernel decides to swap one process's page out to disk and bring another's in, the virtual addresses inside each process stay the same. Only the translations change.
Common Confusion / Misconception
"Virtual memory means swapping to disk." That is one consequence, not the definition. A system with enough RAM to never swap still uses virtual memory, because it still uses translation for isolation and relocation.
"The pointer I see in the debugger is the real address." Only if you are debugging the kernel, or running without an MMU (e.g., bare-metal firmware). In any userspace process, every pointer you see is a virtual address that means nothing outside that process.
"Translation must be slow if it happens on every access." It would be, except the TLB caches translations so that common cases skip the page-table walk. Cluster 2 covers that.
How To Use It
Whenever you look at a memory-related bug or performance surprise:
- Ask whose address space the address you are looking at lives in.
- Ask whether the address is virtual or physical (in userland code, it is always virtual).
- Ask what the mapping says: is this address mapped at all, mapped read-only, mapped to a file, mapped but not yet faulted in?
- Only after those three questions should you reason about values at that address.
Pointer arithmetic, memcpy, out-of-bounds reads, and cache behavior all live downstream of whether the OS even agreed to translate the address.
Check Yourself
- Name three services the OS gives you that become impossible if programs use raw physical addresses.
- Why do two processes showing the same pointer value in
gdbnot imply they are looking at the same data? - What does the kernel have to keep per-process to make translation work?
Mini Drill or Application
For each situation, write one sentence naming the virtual and physical objects involved:
- A process calls
malloc(4096)and writes one byte at the returned pointer. - Two processes
forked from the same parent both read the same element of a large array inherited from the parent. - A process calls
mmap(MAP_SHARED)on a file and another process does the same on the same file. - A process runs off the end of its stack into unmapped memory.