Skip to main content

mmap, Anonymous Mappings, Shared Mappings

What This Concept Is

mmap is a single system call that exposes the virtual-memory machinery directly to userspace. It creates a mapping from a range of virtual addresses in the calling process to some backing:

  • a file on disk (file-backed), or
  • no backing at all (anonymous, zero-initialized on first touch).

Each mapping is also either:

  • private (MAP_PRIVATE): writes are per-process (copy-on-write for file-backed, plain for anonymous), not visible to other mappers of the same file.
  • shared (MAP_SHARED): writes are visible to all processes mapping the same backing. For file-backed shared, writes also eventually propagate to the file.

So there are four useful combinations:

FlavorBackingVisibility of writes
Anonymous privatenone (zero-filled)process-local
Anonymous sharednone (zero-filled)visible across forked children or processes sharing via fd
File-backed privatefile (CoW)per-process, not written back
File-backed sharedfileall mappers, written back to disk

Under the hood, most userland allocators use anonymous private mappings for large allocations; IPC systems use anonymous shared or file-backed shared; I/O frameworks use file-backed private or shared depending on whether they want to mutate on disk.

Why It Matters Here

mmap is one of the most-used interfaces in real systems because it unifies four otherwise-distinct problems:

  • "Give me a big chunk of zeroed memory" -> anonymous private
  • "Give me a big chunk I can share with a child or sibling" -> anonymous shared (after fork) or with memfd_create
  • "Let me read this large file without read/lseek ceremony and let the kernel cache the hot parts" -> file-backed private read-only
  • "Let me work on a file by treating it as memory, and the OS handles writeback" -> file-backed shared

Understanding which flavor matches your need, and what each costs, is central to writing anything that handles non-trivial data volumes: databases, search engines, message brokers, ML systems, image pipelines, anything with a memory-mapped format.

Concrete Example

Anonymous private (what malloc uses for big allocations):

void *p = mmap(NULL, 1 << 30, PROT_READ | PROT_WRITE,
MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
// 1 GiB of virtual address, zero-filled on demand, not backed by anything.

File-backed shared (persistent shared memory via a file):

int fd = open("shared.bin", O_RDWR | O_CREAT, 0600);
ftruncate(fd, 16 << 20);
void *p = mmap(NULL, 16 << 20, PROT_READ | PROT_WRITE,
MAP_SHARED, fd, 0);
// 16 MiB, writes propagate to shared.bin and other mappers.

File-backed private read-only (typical for mmap-ed read of a file):

int fd = open("big.log", O_RDONLY);
struct stat st; fstat(fd, &st);
void *p = mmap(NULL, st.st_size, PROT_READ, MAP_PRIVATE, fd, 0);
// Kernel fills pages from the page cache on fault; no disk writes.

Check the mappings. cat /proc/$pid/maps shows every mapping with its start, end, permissions, offset, device+inode (if file-backed), and path. Anonymous mappings show [anon] or a blank path.

Cost profile.

  • Anonymous private untouched: zero RSS, some kernel metadata.
  • Anonymous private touched: RSS grows by 4 KiB per page written (minor faults).
  • File-backed shared, cached: near-zero latency reads; writes dirty the page and eventually go out via writeback.
  • File-backed private, cold cache: first read is a major fault per page, subsequent reads are cached.

Common Confusion / Misconception

"mmap is always faster than read." Not always. For small sequential reads, read with a well-sized buffer can outperform mmap because mmap has per-page overhead (TLB pressure, page faults). mmap wins for random access, very large files, or when you want the kernel to handle caching.

"MAP_SHARED with a file is the same as write." It is not: writes go through the page cache and are flushed by writeback (or msync). Crash-consistency semantics are subtle; you cannot assume writes are durable just because they returned.

"mmap(MAP_ANONYMOUS) uses RAM immediately." It reserves virtual address space and accounting; physical memory is allocated on demand, one page at a time, on first touch.

How To Use It

Decision tree:

  1. Do you need the data to survive process exit? If yes, file-backed. If no, anonymous.
  2. Do multiple processes need to see each other's writes? If yes, MAP_SHARED. If no, MAP_PRIVATE.
  3. Is this a small transient allocation? Use malloc; the allocator will choose the right flavor.
  4. Is this large (tens of MiB and up)? Prefer mmap directly so free can release physically.

Operational checks for a running process:

  • /proc/$pid/maps: every mapping with flags and backing
  • /proc/$pid/smaps: per-mapping RSS, shared/private breakdown
  • pmap -X $pid: a friendlier summary
  • perf stat -e minor-faults,major-faults ./prog: cost profile

Check Yourself

  1. Describe the four flavors of mmap (private/shared x anon/file) in one sentence each.
  2. Why does a 10 GiB anonymous private mapping cost "nothing" in RSS until touched?
  3. What does MAP_SHARED over a file buy you that read/write does not?
  4. When does mmap lose to read?
  5. What is memfd_create, and why is it a cleaner way to share anonymous memory between processes than MAP_SHARED over a disk file?

Mini Drill or Application

  1. Write a minimal program that mmaps a 1 GiB file read-only and sums it. Report minor and major faults via perf stat.
  2. Write a pair of programs that share memory through a file-backed MAP_SHARED region and exchange integers. Check that writes in one are visible in the other without any read call.
  3. Draw the /proc/$pid/maps output you expect from a simple program that calls malloc(1 MiB), malloc(1 GiB), and mmap an 8 MiB file. Explain which lines are which.
  4. Compare read-based and mmap-based sequential sum of a 10 GiB file on a machine with 8 GiB RAM. Which wins? Why does the page cache make the second run faster either way?
  5. Explain why mmap(MAP_SHARED) writes to a file are not immediately durable.

Read This Only If Stuck