The Buffer Pool and Page Replacement
What This Concept Is
The buffer pool is a fixed-size array of frames in main memory. Each frame can hold one page. When the engine needs page P, it does one of three things:
- Hit:
Pis already in some frame. Return a pointer. Cost: near zero. - Miss, free frame available: read
Pfrom disk into a free frame. One I/O. - Miss, no free frames: pick a victim frame to evict, possibly write it back if dirty, then load
P. One or two I/Os.
Every frame carries metadata: page_id, pin_count (how many threads currently need it), dirty (has it been modified since being read), and a reference bit or access counter used by the replacement policy.
Rules that are not negotiable:
- A pinned page cannot be evicted.
- A dirty page must be written back before eviction (otherwise the modification is lost).
- The page table maps
page_id -> frameand is consulted on every logical read.
Replacement policies you should know:
- LRU: evict the least recently used. Simple, but one sequential scan of a large table can flush everything useful out of the pool.
- Clock: approximate LRU using a circular scan and a reference bit. Cheaper, widely used.
- LRU-K: track the last
Kaccess times and evict based on theK-th most recent. Resists the scan-flood problem because a page that has only been touched once is a worse citizen than one touchedKtimes.
Why It Matters Here
Every later cost claim in the module ("a lookup is 3 I/Os") assumes the buffer pool is doing its job. Specifically:
- the top levels of a B+-tree root and internal nodes are almost always in the pool; only the leaf may miss
- an LSM read against a memtable never touches disk; an LSM read against an old SSTable likely will
- a bad replacement policy under a scan-heavy workload can invalidate all of the above
If you ignore the buffer pool, your I/O estimates will be off by orders of magnitude in either direction.
Concrete Example
A pool of 4 frames, using LRU, receives this access sequence: A, B, C, D, A, E, A, B, F.
step access frames (LRU order, oldest -> newest) action
1 A [A] miss, load A
2 B [A,B] miss
3 C [A,B,C] miss
4 D [A,B,C,D] miss, pool full
5 A [B,C,D,A] hit
6 E [C,D,A,E] miss, evict B
7 A [C,D,E,A] hit
8 B [D,E,A,B] miss, evict C
9 F [E,A,B,F] miss, evict D
Hit rate: 2 / 9. Now suppose step 6 had been a full-table scan that touched 1000 unique pages. Under LRU, everything useful would be evicted; under LRU-K with K=2, the scan pages would each have only one access, so A (touched twice) would stay resident.
Common Confusion / Misconception
"Buffer pool hits are free." They are cheap, but not free: each logical read still does a hash lookup in the page table, increments a pin counter, and walks replacement-policy metadata. On very hot paths this shows up in profiles.
"Eviction only happens under memory pressure." The writer thread may also flush dirty pages proactively (background flush) to keep the pool clean enough to evict quickly. That is why checkpoint and bgwriter appear in performance discussions.
How To Use It
When a query is slow and you suspect storage:
- Get the cache hit ratio. If it is already
~99%, the bottleneck is probably not I/O. - If it is low, ask what is polluting the pool (large scans, cold partitions, too many indexes).
- Check dirty-page behavior: if the writer is always busy, writes may be bottlenecked at eviction, not at disk.
- Only then think about adding memory or changing the replacement policy.
This is also the lens for reading the LSM cluster later: LSM engines effectively run two buffer tiers -- the memtable (their own in-RAM store) and the OS page cache.
Check Yourself
- Why is a pinned page exempt from eviction, and what must the engine guarantee about pinning?
- How does LRU-K resist a single large sequential scan better than plain LRU?
- Why does an engine sometimes write a dirty page even when there is no memory pressure?
Mini Drill or Application
For a pool of size 3 and the access sequence 1, 2, 3, 4, 1, 2, 5, 1, 2, 3, 4, 5, compute the number of misses under:
- FIFO
- LRU
- Clock (reference bit reset on a full rotation)
Then explain in one sentence each why the results differ.