Point Lookups, Range Scans, and Why B+-Trees Dominate OLTP
What This Concept Is
Once you have a B+-tree with order m (roughly m children per internal node) and N leaf entries, three costs fall out:
- point lookup: descend from root to leaf. Cost:
O(log_m N)page reads. - range scan of
krows in order: descend once to the start of the range, then walk leaves through the linked list. Cost:O(log_m N + k / entries_per_leaf). - insert / delete: descend, modify the leaf, then at worst propagate splits or merges up to the root. Cost:
O(log_m N)page writes in the worst case.
The key number is log_m N. With fan-out m ~= 250 (typical for 8 KB pages with short keys), log_250 (10^9) ~= 4. That is why a mature B+-tree on a billion rows is 3 to 5 levels deep.
And the linked leaves are why range scans are cheap: after the initial descent, each subsequent leaf is one sequential link follow, not another tree descent.
Why It Matters Here
OLTP workloads are dominated by:
- point lookups by primary key (
WHERE user_id = ?) - small range scans (
WHERE ts BETWEEN ? AND ? LIMIT 100) - single-row inserts/updates/deletes
A B+-tree does all four well: low constant-factor latency for lookups, cheap linked-leaf ranges, in-place updates that preserve ordering, and a rebalancing story that prevents worst-case drift. Alternatives exist (hash indexes, LSM-trees, skip lists), but none match this mix of capabilities at OLTP scale. That is why B+-trees remain the default primary-index structure in almost every relational engine.
Concrete Example -- Height Math
Assume 8 KB pages, 16-byte keys, 8-byte pointers, and some overhead, giving approximately 300 child pointers per internal node. Leaves hold about 200 entries.
N = 10^6: leaves =10^6 / 200 = 5000. Internal levels:log_300 5000 ~= 1.5, so~2internal levels + leaf = height3.N = 10^9: leaves =5 * 10^6. Internal levels:log_300 (5*10^6) ~= 2.7, so~3+ leaf = height4.
Assume the root and next level are cached (they almost always are). A point lookup then costs 1 cold random I/O (the leaf) plus 1 heap fetch if the index is secondary. That is the whole reason OLTP latency can sit in the low milliseconds on a billion-row table.
Concrete Example -- Range Scan
SELECT * FROM events WHERE ts BETWEEN '2026-01-01' AND '2026-01-02' ORDER BY ts on an index clustered by ts.
- Descend from root to the leaf containing the smallest qualifying
ts.~3page reads (mostly cached). - Walk forward along the leaf linked list, reading one leaf page at a time, emitting rows in order.
- Stop at the first entry
> '2026-01-02'.
If the range matches 100,000 rows at 200 entries/leaf, that is 500 sequential leaf reads plus some heap fetches. Sequentiality is the win: the same 100,000 rows retrieved by a hash or unordered index would produce 500 random reads for the leaves plus the heap fetches, possibly 100x slower.
Common Confusion / Misconception
"The tree is tall because there are a billion rows." The fan-out is what makes the tree short. The logarithm is base-m, not base-2. A 32-level binary tree and a 4-level B+-tree both address a billion items; the B+-tree gets to 4 levels because each node packs hundreds of children.
"Insertions are always cheap." In the average case yes; in the worst case a leaf split can propagate to the root and the tree can grow by one level. Under heavy random-key insert traffic, split cascades can dominate the write path -- this is exactly the problem LSM-trees (next cluster) were designed to attack.
How To Use It
When reading a query plan that uses a B+-tree index:
- Estimate height from table size and fan-out.
- Assume the upper
1-2levels are cached. - Count
1 + 1I/Os for a secondary-index point lookup (leaf + heap). - Count
H + k/entries_per_leafpage reads for an ordered range ofkrows. - Compare against the full-scan cost (table size in pages); pick whichever is lower.
This is exactly the arithmetic the optimizer does; doing it yourself calibrates your intuition.
Check Yourself
- Why does fan-out matter more than total size for B+-tree height?
- Why are range scans cheap relative to
krandom lookups of the same keys? - Under what workload would a hash index beat a B+-tree for point lookups, and why do most engines still default to B+-trees?
Mini Drill or Application
For each scenario, estimate page reads. Assume height 4, fan-out 300, leaves 200 entries, top 2 levels cached.
- Point lookup by PK that returns
1row (clustered index). - Point lookup by secondary index that returns
1row. - Range scan returning
10,000ordered rows (clustered). - Range scan returning
10,000ordered rows (secondary index + heap fetches, random heap order). - Insert
1row at a leaf that is50%full.