Build Your Own Memory Allocator
"Malloc is not magic. It is a bunch of code." — Dmitri Šošić
A memory allocator is the perfect Semester 4 project: small enough to finish, deep enough that every implementation choice — free list, splitting, coalescing, alignment — teaches something real about how programs actually use memory.
1. Overview & motivation
A memory allocator implements malloc(size) and free(ptr). Sounds simple. The challenges are:
- Fragmentation. External fragmentation (free space split into useless small pieces). Internal fragmentation (rounding up wastes space).
- Alignment.
doubleneeds 8-byte alignment; SIMD types need 16. Misaligned returns are undefined behavior on many architectures. - Metadata cost. Every allocated block needs a header; small blocks pay disproportionately.
- Performance. Allocator latency matters: glibc's allocator is a hot path in every C program.
What you can only learn by building one:
- Why production allocators (jemalloc, tcmalloc, mimalloc) use size classes rather than free lists.
- Why
brkandmmapare both used: small allocations on the heap, large onmmap. - Why thread-caching exists and why it's hard to get right.
- Why buddy allocation is so clean conceptually and so wasteful in practice.
2. Where this fits in the degree
- Phase: Systems
- Semester: 4 (Systems Programming)
- Modules deepened: Module 2 (memory, pointers, machine representation) — the entire project. Module 3 (computer organization) — cache effects of allocation strategies. Module 4 (systems-level programming) —
sbrk,mmap,madvise.
Cross-phase relevance:
- Sets up the Operating System tutorial — kernels need their own allocators.
- Performance discipline applies to Database (KV) memory management.
3. Prerequisites
- C: pointers, structs,
void*, pointer arithmetic. Comfortable with(char*)p + offsetpatterns. - Familiarity with
sbrkormmapfrom a Unix systems course (you'll learn enough here if not).
You do not need formal computer-architecture knowledge. You will pick up alignment and cache effects empirically.
4. Theory & research
Required reading
- Doug Lea, "A Memory Allocator" (gee.cs.oswego.edu/dl/html/malloc.html) — the design notes for dlmalloc. The single best document on practical allocator design.
- CS:APP, Computer Systems: A Programmer's Perspective, Chapter 9 (Virtual Memory), section 9.9 (Dynamic Memory Allocation) — Bryant & O'Hallaron. The standard textbook treatment.
Strongly recommended
- K&R, The C Programming Language, Section 8.7 — Kernighan & Ritchie's storage allocator. Simple and complete.
- Wilson, Johnstone, Neely, Boles (1995), "Dynamic Storage Allocation: A Survey and Critical Review" — academic survey, comprehensive.
- Bonwick (1994), "The Slab Allocator" — slab allocation as used by Solaris and Linux.
For production allocators
- jemalloc design — Facebook's note on scalability.
- Microsoft Research, "Mimalloc: Free List Sharding in Action" — modern, fast allocator.
5. Curated tutorial list (from BYO-X)
- C: Malloc is not magic -- Implementing your own memory allocator — Dmitri Šošić ⭐ recommended primary
The BYO-X catalog has one entry under "Memory Allocator," but the topic is also covered in the "Operating System" section under "Malloc tutorial":
- C: Malloc tutorial — Marwan Burelle — PDF, ~30 pages, very thorough.
Additional canonical references
- CS:APP Lab — Malloc Lab (csapp.cs.cmu.edu/3e/labs.html) — Carnegie Mellon's allocator assignment. Open source. The single best practice problem.
- MIT 6.172 Performance Engineering — Malloc Project — comparable but with performance focus.
6. Recommended primary path
Dan Luu's "Malloc tutorial" (the Šošić article hosted on danluu.com) is short, opinionated, and walks you from sbrk-based bump allocation through free lists and coalescing in clean C.
After completing it, do CS:APP Malloc Lab for a graded performance-oriented version: same problem, but with a benchmark suite and a "memory utilization × throughput" score. This is the single best way to compare your allocator to a well-engineered baseline.
7. Implementation milestones
Milestone 1: Bump allocator (~30 lines)
The simplest allocator possible. malloc advances brk by the requested size; free does nothing.
#include <unistd.h>
void *bump_malloc(size_t size) {
void *p = sbrk(0);
if (sbrk(size) == (void*) -1) return NULL;
return p;
}
void bump_free(void *ptr) { (void) ptr; /* no-op */ }
Evidence: Run a program that allocates 1000 blocks and prints all addresses. Show that the heap grows linearly.
Milestone 2: Free list with first-fit
Add a header to each block: size, free/used flag, next pointer. On free, push the block onto a free list. On malloc, scan the list for the first block large enough.
typedef struct block_header {
size_t size;
int is_free;
struct block_header *next;
} header_t;
static header_t *free_list = NULL;
void *my_malloc(size_t size) {
header_t *prev = NULL;
header_t *curr = free_list;
while (curr) {
if (curr->is_free && curr->size >= size) {
curr->is_free = 0;
if (prev) prev->next = curr->next;
else free_list = curr->next;
return (void*)(curr + 1);
}
prev = curr; curr = curr->next;
}
header_t *h = sbrk(sizeof(header_t) + size);
if (h == (void*) -1) return NULL;
h->size = size; h->is_free = 0; h->next = NULL;
return (void*)(h + 1);
}
void my_free(void *ptr) {
if (!ptr) return;
header_t *h = (header_t*) ptr - 1;
h->is_free = 1;
h->next = free_list;
free_list = h;
}
Evidence: Allocate 100 blocks, free every other one, allocate again. Show that some reuse happens.
Milestone 3: Block splitting
If a free block is much larger than requested, split it. Mark the leftover as a new free block.
Evidence: With a workload that allocates [1024, 16, 1024, 16, ...], show that small allocations reuse leftover space from large frees.
Milestone 4: Coalescing
When freeing, check whether neighbouring blocks are also free; if so, merge them. Requires either a doubly-linked free list or footer tags so you can walk backward.
This is the hard part. Get it right and your fragmentation problem drops by 10× on realistic workloads.
Evidence: Run a fragmentation stress test: alternating small/large allocations, freed in a pattern that creates external fragmentation. Compare external fragmentation before/after coalescing.
Milestone 5: Alignment
Every returned pointer must be aligned to at least 8 bytes (16 on x86-64 for SIMD). Round sizes up; round headers' positions appropriately.
#define ALIGN(size) (((size) + 7) & ~7)
Evidence: Allocator returns pointers whose low 3 bits are always zero. Confirm with a unit test.
Milestone 6: mmap for large allocations
Beyond a threshold (e.g., 128 KB), use mmap instead of sbrk. Free with munmap. This avoids polluting the heap with huge blocks.
Evidence: Allocate 10 × 1 MB; show that brk does not grow (use sbrk(0) before and after).
Milestone 7: Size classes (slab-style)
The big leap: instead of one free list, maintain a free list per size class (8, 16, 32, 64, ..., 4096 bytes). malloc(size) rounds up to the next class and pops from that class's list. Production allocators do this.
Evidence: Benchmark a small-allocation-heavy workload (random small malloc/free) against the first-fit allocator. Show a 5–10× speedup.
8. Tests & evidence
| Test | How |
|---|---|
| Correctness — basic | Allocate, write, read, free. AddressSanitizer must be clean. |
| Correctness — alignment | All returned pointers have low N bits zero. |
| Correctness — boundary | Allocate exactly the requested size; writing one byte past must trigger ASan. |
| Correctness — large | Allocate > 4 GB total over many malloc/free cycles. |
| Correctness — random | Run 100,000 random malloc/free pairs with sizes 1B–1MB. Verify no leaks. |
| Fragmentation | After a worst-case interleaving, measure free-block size distribution. |
| Performance | Compare vs glibc malloc on a real workload (sqlite3, a small server). Allocator should be within 3–10× of glibc. |
Use Valgrind and AddressSanitizer (-fsanitize=address) — they are mandatory tools, not optional.
9. Common pitfalls
- Forgetting alignment. Will pass simple tests, then crash with SIGBUS on a
doubleaccess. - Header off-by-one.
(header_t*) ptr - 1is right;(header_t*)((char*)ptr - sizeof(header_t))is also right; mixing them is wrong. - Re-freeing. Double-free corrupts the free list. Add a magic number in the header for debugging.
- Using
reallocnaively. It is not "malloc + free". It must preserve the contents up tomin(old_size, new_size). - Concurrent allocator without locks. Will explode under threads. For the tutorial, single-threaded is fine. State the limitation.
- Tracking the wrong sizes. Caller asked for 13 bytes. You allocated 16 (aligned). Header must record 16 (the actual block size), not 13.
- Not testing on small AND large blocks. Allocator bugs love asymmetric workloads.
10. Extensions
- Per-thread caching. Each thread gets a small per-CPU cache of recently freed blocks. Avoids cross-thread lock contention. tcmalloc, jemalloc, mimalloc all do this.
- Buddy allocator. Power-of-two sizes only; splitting and coalescing are trivially fast. Linux kernel uses this for page allocation.
- Stack allocator / arena. No
free— justresetto zero. Used for per-request memory in servers. - Garbage collected variant. Add tracing or refcounting. Connects to the Interpreter project.
- NUMA-aware allocation. Use
mbindto pin allocations to a node.
11. Module integration
| Module | What the allocator deepens |
|---|---|
| Sem 4 Module 2 — Memory & pointers | Direct application. Every header walk is pointer arithmetic. |
| Sem 4 Module 3 — Computer organization | Cache-line awareness when designing block sizes. |
| Sem 4 Module 4 — Systems-level programming | sbrk, mmap, madvise — all required. |
| Sem 5 Module 2 — Virtual memory | Allocator interacts directly with the kernel's page allocator. |
| Operating System tutorial | A kernel needs its own allocator — usually a simpler bump or buddy allocator. |
| Database (KV) tutorial | KV stores often use slab allocation for buffer-pool management. |
12. Portfolio framing
What to publish:
- Clean C source with
Makefile,tests/, and abench/directory. - A README with a comparison plot vs glibc on at least one benchmark.
- AddressSanitizer-clean test transcript.
What to keep private:
- Early naive versions with bugs (they're useful to keep around for learning, but not for the portfolio).
Reviewer entry points:
src/malloc.c— the main implementation.tests/test_fragmentation.c— the most informative test.bench/results.md— the benchmark report.- The README should have a "what this does not do" section: thread safety, NUMA awareness, security mitigations.
Source
This tutorial draws from the BYO-X catalog "Memory Allocator" entry and the closely-related "Malloc tutorial" in the OS section. Dan Luu's hosted version of Dmitri Šošić's article is the canonical short reference.