Skip to main content

Build Your Own Memory Allocator

"Malloc is not magic. It is a bunch of code." — Dmitri Šošić

A memory allocator is the perfect Semester 4 project: small enough to finish, deep enough that every implementation choice — free list, splitting, coalescing, alignment — teaches something real about how programs actually use memory.


1. Overview & motivation

A memory allocator implements malloc(size) and free(ptr). Sounds simple. The challenges are:

  • Fragmentation. External fragmentation (free space split into useless small pieces). Internal fragmentation (rounding up wastes space).
  • Alignment. double needs 8-byte alignment; SIMD types need 16. Misaligned returns are undefined behavior on many architectures.
  • Metadata cost. Every allocated block needs a header; small blocks pay disproportionately.
  • Performance. Allocator latency matters: glibc's allocator is a hot path in every C program.

What you can only learn by building one:

  • Why production allocators (jemalloc, tcmalloc, mimalloc) use size classes rather than free lists.
  • Why brk and mmap are both used: small allocations on the heap, large on mmap.
  • Why thread-caching exists and why it's hard to get right.
  • Why buddy allocation is so clean conceptually and so wasteful in practice.

2. Where this fits in the degree

  • Phase: Systems
  • Semester: 4 (Systems Programming)
  • Modules deepened: Module 2 (memory, pointers, machine representation) — the entire project. Module 3 (computer organization) — cache effects of allocation strategies. Module 4 (systems-level programming) — sbrk, mmap, madvise.

Cross-phase relevance:


3. Prerequisites

  • C: pointers, structs, void*, pointer arithmetic. Comfortable with (char*)p + offset patterns.
  • Familiarity with sbrk or mmap from a Unix systems course (you'll learn enough here if not).

You do not need formal computer-architecture knowledge. You will pick up alignment and cache effects empirically.


4. Theory & research

Required reading

  • Doug Lea, "A Memory Allocator" (gee.cs.oswego.edu/dl/html/malloc.html) — the design notes for dlmalloc. The single best document on practical allocator design.
  • CS:APP, Computer Systems: A Programmer's Perspective, Chapter 9 (Virtual Memory), section 9.9 (Dynamic Memory Allocation) — Bryant & O'Hallaron. The standard textbook treatment.
  • K&R, The C Programming Language, Section 8.7 — Kernighan & Ritchie's storage allocator. Simple and complete.
  • Wilson, Johnstone, Neely, Boles (1995), "Dynamic Storage Allocation: A Survey and Critical Review" — academic survey, comprehensive.
  • Bonwick (1994), "The Slab Allocator" — slab allocation as used by Solaris and Linux.

For production allocators

  • jemalloc design — Facebook's note on scalability.
  • Microsoft Research, "Mimalloc: Free List Sharding in Action" — modern, fast allocator.

5. Curated tutorial list (from BYO-X)

  • C: Malloc is not magic -- Implementing your own memory allocatorDmitri Šošićrecommended primary

The BYO-X catalog has one entry under "Memory Allocator," but the topic is also covered in the "Operating System" section under "Malloc tutorial":

  • C: Malloc tutorialMarwan Burelle — PDF, ~30 pages, very thorough.

Additional canonical references

  • CS:APP Lab — Malloc Lab (csapp.cs.cmu.edu/3e/labs.html) — Carnegie Mellon's allocator assignment. Open source. The single best practice problem.
  • MIT 6.172 Performance Engineering — Malloc Project — comparable but with performance focus.

Dan Luu's "Malloc tutorial" (the Šošić article hosted on danluu.com) is short, opinionated, and walks you from sbrk-based bump allocation through free lists and coalescing in clean C.

After completing it, do CS:APP Malloc Lab for a graded performance-oriented version: same problem, but with a benchmark suite and a "memory utilization × throughput" score. This is the single best way to compare your allocator to a well-engineered baseline.


7. Implementation milestones

Milestone 1: Bump allocator (~30 lines)

The simplest allocator possible. malloc advances brk by the requested size; free does nothing.

#include <unistd.h>
void *bump_malloc(size_t size) {
void *p = sbrk(0);
if (sbrk(size) == (void*) -1) return NULL;
return p;
}
void bump_free(void *ptr) { (void) ptr; /* no-op */ }

Evidence: Run a program that allocates 1000 blocks and prints all addresses. Show that the heap grows linearly.

Milestone 2: Free list with first-fit

Add a header to each block: size, free/used flag, next pointer. On free, push the block onto a free list. On malloc, scan the list for the first block large enough.

typedef struct block_header {
size_t size;
int is_free;
struct block_header *next;
} header_t;

static header_t *free_list = NULL;

void *my_malloc(size_t size) {
header_t *prev = NULL;
header_t *curr = free_list;
while (curr) {
if (curr->is_free && curr->size >= size) {
curr->is_free = 0;
if (prev) prev->next = curr->next;
else free_list = curr->next;
return (void*)(curr + 1);
}
prev = curr; curr = curr->next;
}
header_t *h = sbrk(sizeof(header_t) + size);
if (h == (void*) -1) return NULL;
h->size = size; h->is_free = 0; h->next = NULL;
return (void*)(h + 1);
}

void my_free(void *ptr) {
if (!ptr) return;
header_t *h = (header_t*) ptr - 1;
h->is_free = 1;
h->next = free_list;
free_list = h;
}

Evidence: Allocate 100 blocks, free every other one, allocate again. Show that some reuse happens.

Milestone 3: Block splitting

If a free block is much larger than requested, split it. Mark the leftover as a new free block.

Evidence: With a workload that allocates [1024, 16, 1024, 16, ...], show that small allocations reuse leftover space from large frees.

Milestone 4: Coalescing

When freeing, check whether neighbouring blocks are also free; if so, merge them. Requires either a doubly-linked free list or footer tags so you can walk backward.

This is the hard part. Get it right and your fragmentation problem drops by 10× on realistic workloads.

Evidence: Run a fragmentation stress test: alternating small/large allocations, freed in a pattern that creates external fragmentation. Compare external fragmentation before/after coalescing.

Milestone 5: Alignment

Every returned pointer must be aligned to at least 8 bytes (16 on x86-64 for SIMD). Round sizes up; round headers' positions appropriately.

#define ALIGN(size) (((size) + 7) & ~7)

Evidence: Allocator returns pointers whose low 3 bits are always zero. Confirm with a unit test.

Milestone 6: mmap for large allocations

Beyond a threshold (e.g., 128 KB), use mmap instead of sbrk. Free with munmap. This avoids polluting the heap with huge blocks.

Evidence: Allocate 10 × 1 MB; show that brk does not grow (use sbrk(0) before and after).

Milestone 7: Size classes (slab-style)

The big leap: instead of one free list, maintain a free list per size class (8, 16, 32, 64, ..., 4096 bytes). malloc(size) rounds up to the next class and pops from that class's list. Production allocators do this.

Evidence: Benchmark a small-allocation-heavy workload (random small malloc/free) against the first-fit allocator. Show a 5–10× speedup.


8. Tests & evidence

TestHow
Correctness — basicAllocate, write, read, free. AddressSanitizer must be clean.
Correctness — alignmentAll returned pointers have low N bits zero.
Correctness — boundaryAllocate exactly the requested size; writing one byte past must trigger ASan.
Correctness — largeAllocate > 4 GB total over many malloc/free cycles.
Correctness — randomRun 100,000 random malloc/free pairs with sizes 1B–1MB. Verify no leaks.
FragmentationAfter a worst-case interleaving, measure free-block size distribution.
PerformanceCompare vs glibc malloc on a real workload (sqlite3, a small server). Allocator should be within 3–10× of glibc.

Use Valgrind and AddressSanitizer (-fsanitize=address) — they are mandatory tools, not optional.


9. Common pitfalls

  • Forgetting alignment. Will pass simple tests, then crash with SIGBUS on a double access.
  • Header off-by-one. (header_t*) ptr - 1 is right; (header_t*)((char*)ptr - sizeof(header_t)) is also right; mixing them is wrong.
  • Re-freeing. Double-free corrupts the free list. Add a magic number in the header for debugging.
  • Using realloc naively. It is not "malloc + free". It must preserve the contents up to min(old_size, new_size).
  • Concurrent allocator without locks. Will explode under threads. For the tutorial, single-threaded is fine. State the limitation.
  • Tracking the wrong sizes. Caller asked for 13 bytes. You allocated 16 (aligned). Header must record 16 (the actual block size), not 13.
  • Not testing on small AND large blocks. Allocator bugs love asymmetric workloads.

10. Extensions

  • Per-thread caching. Each thread gets a small per-CPU cache of recently freed blocks. Avoids cross-thread lock contention. tcmalloc, jemalloc, mimalloc all do this.
  • Buddy allocator. Power-of-two sizes only; splitting and coalescing are trivially fast. Linux kernel uses this for page allocation.
  • Stack allocator / arena. No free — just reset to zero. Used for per-request memory in servers.
  • Garbage collected variant. Add tracing or refcounting. Connects to the Interpreter project.
  • NUMA-aware allocation. Use mbind to pin allocations to a node.

11. Module integration

ModuleWhat the allocator deepens
Sem 4 Module 2 — Memory & pointersDirect application. Every header walk is pointer arithmetic.
Sem 4 Module 3 — Computer organizationCache-line awareness when designing block sizes.
Sem 4 Module 4 — Systems-level programmingsbrk, mmap, madvise — all required.
Sem 5 Module 2 — Virtual memoryAllocator interacts directly with the kernel's page allocator.
Operating System tutorialA kernel needs its own allocator — usually a simpler bump or buddy allocator.
Database (KV) tutorialKV stores often use slab allocation for buffer-pool management.

12. Portfolio framing

What to publish:

  • Clean C source with Makefile, tests/, and a bench/ directory.
  • A README with a comparison plot vs glibc on at least one benchmark.
  • AddressSanitizer-clean test transcript.

What to keep private:

  • Early naive versions with bugs (they're useful to keep around for learning, but not for the portfolio).

Reviewer entry points:

  • src/malloc.c — the main implementation.
  • tests/test_fragmentation.c — the most informative test.
  • bench/results.md — the benchmark report.
  • The README should have a "what this does not do" section: thread safety, NUMA awareness, security mitigations.

Source

This tutorial draws from the BYO-X catalog "Memory Allocator" entry and the closely-related "Malloc tutorial" in the OS section. Dan Luu's hosted version of Dmitri Šošić's article is the canonical short reference.