Writing a Custom Allocator: Free Lists and Arenas
What This Concept Is
malloc and free are C library functions, not syscalls. They manage memory that was obtained from the kernel in large chunks (via sbrk or mmap) and subdivide it among many small allocations. If you understand how to write one, you understand what malloc is hiding.
Three allocator shapes cover most real designs:
- Bump allocator (arena). Hold a big buffer and a cursor;
alloc(n)returns the cursor and advances it byn.freedoes nothing; you release the entire arena at once. Ideal for per-request or per-frame allocations. - Free-list allocator. Each freed block is linked into a list (often sorted or bucketed by size).
alloc(n)scans the list for a fit (first-fit, best-fit) and splits the block;freereturns the block and coalesces adjacent free blocks. K&R Chapter 8.7 presents a clean version. - Slab / size-class allocator. One free list per fixed size (8, 16, 32, ...).
allocandfreeare O(1) within a size class. Modernmallocimplementations (jemalloc,tcmalloc, glibcptmalloc) are size-class allocators with per-thread caches.
Why It Matters Here
You will rarely replace the system malloc, but you will often build arenas: for a parser's AST, a request handler, a game frame, a compiler pass. Arenas turn thousands of malloc/free pairs into one mmap and one munmap, cutting allocation cost to near-zero and eliminating whole categories of leaks.
Writing even a toy allocator also demystifies a pile of "mysterious" crashes: double-frees, use-after-free, heap-overflow-overwrites-free-list -- all become obvious once you can picture the free list.
Concrete Example
A minimal bump arena over an mmap'd region:
#include <sys/mman.h>
#include <stdint.h>
#include <stddef.h>
#include <stdio.h>
typedef struct {
char *base;
size_t cap;
size_t used;
} Arena;
static Arena arena_create(size_t cap) {
void *p = mmap(NULL, cap, PROT_READ | PROT_WRITE,
MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
return (Arena){ .base = p, .cap = cap, .used = 0 };
}
static void *arena_alloc(Arena *a, size_t n, size_t align) {
uintptr_t cur = (uintptr_t)a->base + a->used;
uintptr_t aln = (cur + (align - 1)) & ~(align - 1);
size_t need = (size_t)(aln - (uintptr_t)a->base) + n;
if (need > a->cap) return NULL;
a->used = need;
return (void*)aln;
}
static void arena_reset(Arena *a) { a->used = 0; }
static void arena_destroy(Arena *a) { munmap(a->base, a->cap); a->base = NULL; }
int main(void) {
Arena a = arena_create(1 << 20);
int *xs = arena_alloc(&a, 1000 * sizeof(int), _Alignof(int));
for (int i = 0; i < 1000; i++) xs[i] = i;
printf("sum=%d used=%zu\n", xs[999], a.used);
arena_destroy(&a);
}
No free per object. The whole mental model is: allocate freely, release all at once.
For a free-list allocator, the canonical worked example is K&R 8.7. Each free block has a header with a size and a next pointer; alloc scans the circular list for a block large enough and splits it; free inserts the block in sorted order and coalesces with neighbors.
Common Confusion / Misconception
"malloc asks the kernel for memory on every call." No. A good malloc asks the kernel once for a large chunk, then serves many small requests from it. You can observe this: strace a program doing a million tiny allocations, and you will see only a handful of brk or mmap calls.
"I can free memory I got from my arena." No. If your API has no per-object free, respect that. Mixing allocators (arena + libc free) is a guaranteed crash -- the libc free expects a libc-allocated block, and your arena blocks do not have the right headers.
"Fragmentation is only a problem on 32-bit systems." Fragmentation still matters: a 64-bit server that runs for days can degrade in throughput because allocation metadata no longer fits in cache. Size-class allocators exist specifically to bound fragmentation.
How To Use It
Pick the allocator shape by the lifetime of the objects, not the count:
- One "generation" of objects that all die together -> arena.
- Many short-lived objects with varying sizes -> libc
mallocis fine. - One dominant size used heavily -> slab/pool allocator of that size.
- Long-running process with mixed sizes and high throughput -> keep libc; reach for
jemalloc/tcmallocbefore rolling your own.
Check Yourself
- Why is an arena often faster than many
malloc/freepairs? - What is the role of coalescing in a free-list allocator?
- Why is calling libc
freeon a pointer from your arena a bug?
Mini Drill or Application
Do all four:
- Run the bump arena above and add a stress test: 10 million tiny allocations. Time it against the equivalent loop using
malloc/free. - Read K&R 8.7 and transcribe the
alloc/freeimplementation into your own file. Run its example and explain each field ofHeader. - Add a "checkpoint/restore" API to your arena:
size_t mark = a.used; ...; a.used = mark;. Show how it enables per-scope cleanup. - In one sentence, explain why double-freeing in a free-list allocator can corrupt other allocations, not just the duplicated one.
Read This Only If Stuck
- K&R 8.7: Example -- A Storage Allocator -- the canonical free-list allocator
- K&R 5.4: Address Arithmetic
- Man page:
man 3 malloc - Man page:
man 2 brk - Man page:
man 2 mmap-- as the big-region primitive