Kernel Memory Allocators: Buddy System, Slab
What This Concept Is
The kernel needs to allocate memory at two very different granularities:
- Page frames and contiguous runs of page frames for userspace, DMA buffers, device memory. The Linux answer is the buddy allocator.
- Small objects of fixed types (inodes, dentries,
task_struct, sockets,mm_struct), often millions at a time. The Linux answer is the slab / slub / slob allocator family, built on top of the buddy allocator.
The two work together: the buddy allocator hands out page-aligned runs of 2^k frames; the slab allocator carves those runs into caches of equally-sized objects.
Buddy allocator. Memory is split into free-list "orders": order 0 is single pages, order 1 is 2-page chunks, order 2 is 4-page chunks, and so on, typically up to order 10 or 11. When a request comes in, the allocator rounds up to the next power of two and takes from that free list; if empty, it splits a larger block into two buddies. When a block is freed, if its buddy is also free they coalesce.
Slab allocator. For each object type, there is a cache. A cache is backed by one or more slabs (pages obtained from the buddy allocator). Each slab is pre-carved into objects of the cache's size, plus some metadata. Allocating an object is just popping a free-object pointer off a list; freeing is pushing it back. On Linux, /proc/slabinfo shows every cache's size, object count, and usage.
Why It Matters Here
These are the allocators that keep the kernel responsive under load. Every network packet, every file descriptor, every process descriptor goes through them. If you are looking at kernel memory pressure, slabtop and /proc/meminfo's Slab, SReclaimable, and SUnreclaim lines tell you what is eating RAM.
Understanding this pair also explains why the kernel uses two levels:
- The buddy allocator is good at contiguous physical memory (needed for DMA and large pages) but bad at small objects (it rounds up aggressively).
- The slab allocator is good at small fixed-size objects but depends on the buddy allocator to get pages in the first place.
Neither alone is enough. Together they cover the kernel's allocation profile.
Concrete Example
Buddy example. Machine has 16 MiB of kernel-usable memory; maximum order is 12 (4 MiB). Initial free list: one block of order 12. A driver asks for 8 KiB (order 1). The allocator walks up: no order-1 block, no order-2, ... finds order-12 block. It splits down, placing buddies on each lower order's free list, until an order-1 block is available and handed out. Freeing it coalesces back up if buddies are free.
Slab example. The inode_cache might report (/proc/slabinfo):
# name <active_objs> <num_objs> <objsize> <objperslab> <pagesperslab>
inode_cache 54321 55000 656 12 2
54,321 inodes currently allocated, each 656 bytes, 12 per slab, 2 pages per slab. Total: about 9,200 slabs = 18,400 pages = 72 MiB.
The three slab flavors:
- SLAB: original implementation, per-CPU free lists, aggressive caching. Memory-heavy.
- SLUB: Linux default today, simpler data structures, better scalability under multi-core contention.
- SLOB: for embedded systems where total memory is tiny; simple free-list allocator without per-type caches.
Common Confusion / Misconception
"The kernel uses kmalloc and vmalloc interchangeably." Not quite. kmalloc gives physically contiguous memory (via slab on top of buddy). vmalloc gives virtually contiguous memory (but physically scattered, built from individual pages and stitched together in the kernel's virtual address space). Use kmalloc for small, speed-critical, DMA-capable allocations; vmalloc for large allocations where physical contiguity is unnecessary.
"Slab fragmentation is the same as userland fragmentation." Slabs eliminate external fragmentation within a cache (objects are all the same size), but they can still suffer from partially-filled slabs holding on to pages, and from having too many per-type caches each rounded up to multiples of pages.
"The buddy allocator never fails for reasonable requests." Under memory pressure, the buddy allocator can fail to find a contiguous block even when total free memory is ample. That is why vmalloc exists and why large DMA requests can fail while cat /proc/meminfo looks healthy.
How To Use It
For kernel debugging or tuning:
- Start with
/proc/meminfofor the big picture:Slab,SReclaimable,SUnreclaim,PageTables,KernelStack. - Use
slabtopto see which caches are growing. A cache eating gigabytes is a leak, a driver issue, or a reclamation problem. - For fragmentation in the buddy allocator, inspect
/proc/buddyinfo. Low entries at high orders with memory pressure means external fragmentation at page-frame granularity. - If you are a driver author:
kmalloc(size, GFP_KERNEL)for small + contiguous,vmallocfor large + non-contiguous,alloc_pages(GFP, order)for raw buddy access.
Check Yourself
- What problem does the buddy allocator solve that a straight free-list allocator for pages would not?
- Why do slabs exist on top of the buddy allocator rather than replacing it?
- What is the difference between internal and external fragmentation for these allocators?
- Why does
/proc/slabinfohave so many caches (hundreds)? What does each one tell you? - Why can
kmallocfail while/proc/meminfoshows plenty of free memory?
Mini Drill or Application
- On a Linux system, read
/proc/meminfo. SumSlaband the major cache sizes fromslabtop. How does that compare to total RAM? - Which slab caches tend to dominate on a server running a lot of small files (many inodes/dentries)? Which tend to dominate on a machine with many sockets?
- Explain the buddy allocator's behavior on a request for 9 pages. What order is served? How much is wasted?
- A driver needs a 1 MiB DMA-capable contiguous buffer. Which allocator is appropriate, and why can the request fail?
- A kernel module is leaking
task_structs. What wouldslabtopshow over time?