Join Algorithms: Nested Loop, Hash, Sort-Merge
What This Concept Is
A join combines rows from two inputs R and S on a condition (usually equality on one column). Three algorithm families dominate. Every database on earth implements some subset.
Nested loop join (NLJ). For each row r in R, scan S looking for matches.
for r in R:
for s in S:
if match(r, s): emit(r, s)
Cost: |R| * |S| tuple comparisons, or |R| * pages(S) I/Os in the naive form. Very good when R is tiny or when S has an index on the join key (index nested loop join: lookup rather than scan).
Hash join (HJ). Build a hash table on the smaller input, then probe with the larger.
build phase: for s in S: ht.insert(key(s), s) # S is the smaller side
probe phase: for r in R: for s in ht.lookup(key(r)): emit(r, s)
Cost: |R| + |S| tuple operations, assuming the hash table fits in memory. If it does not, grace hash join partitions both sides into buckets that do fit, at the cost of writing partitions to disk.
Sort-merge join (SMJ). Sort both inputs on the join key, then walk them in tandem.
sort R by key
sort S by key
i = 0; j = 0
while i < |R| and j < |S|:
if R[i].key < S[j].key: i += 1
elif R[i].key > S[j].key: j += 1
else: emit all matches for this key group; advance i or j
Cost dominated by the sorts: O(|R| log |R| + |S| log |S|). But if either input is already sorted (common after a clustered-index scan or a prior sort), the algorithm becomes cheap.
Why It Matters Here
Join algorithm choice is where most query performance lives. A three-way join over 10^6 rows on each side could be 10^12 tuple comparisons with nested loops, 3 * 10^6 with hash joins, or zero sort cost with sort-merge if the inputs arrive sorted. Three decisions by the optimizer can span six orders of magnitude.
It is also the topic where the cost model has the most leverage (next concept): picking the wrong join algorithm is the single most common cause of slow query plans.
When Each Wins
| Condition | Preferred join | Reason |
|---|---|---|
| Either input is very small (< a few thousand rows) | Nested loop (broadcast inner) | Avoid hash build overhead |
| Inner side has an index on the join key | Index nested loop | Each outer row is one indexed lookup |
| Both sides large, equi-join, enough memory | Hash join | `O( |
| Memory too small for hash table | Grace hash join | Partition to fit |
| Inputs already sorted (or need to be) | Sort-merge | Zero or cheap sort; also produces sorted output |
Non-equi join (<, <=, range) | Nested loop or special range-join | Hash/merge need equality |
Concrete Example -- Three Algorithms on the Same Query
SELECT o.id, u.name FROM orders o JOIN users u ON o.user_id = u.id
ordershas10^7rows,usershas10^5.users.idis primary key (clustered B+-tree).
NLJ naive: 10^7 * 10^5 = 10^12 comparisons. Terrible.
Index NLJ: for each order, one B+-tree lookup by user_id. 10^7 index lookups, say 10^7 random-ish reads, heavily cached. This is actually common in OLTP engines for small outers.
Hash join: build hash table on users (the smaller side) -- about 10^5 entries, fits easily in memory. Probe with orders. 10^7 + 10^5 ~= 10^7 tuple operations. One sequential scan each side. Usually the winner here.
Sort-merge: sort both by user_id. Sorting 10^7 rows isn't free, but if orders is already clustered by user_id, the sort vanishes on that side. Useful in analytical engines or when the next operator also wants sorted input.
Common Confusion / Misconception
"Hash join always beats nested loop." Not when the hash-table build exceeds the work an index nested loop would do. For a tiny outer (100 rows) against an indexed inner, index NLJ is typically 10x faster. Optimizers know this; users who expect "hash always wins" are the most common source of bad hints.
"Sort-merge is obsolete." It is the algorithm of choice for: (1) joins that also need sorted output (window functions, ORDER BY on join key), (2) joins where both inputs arrive sorted, and (3) range joins on sorted data. Any analytical engine worth its salt still uses it.
How To Use It
Reading EXPLAIN output:
- Identify the join operator (
Nested Loop,Hash Join,Merge Join). - Identify which side is the build/inner vs probe/outer -- the smaller side should be the build.
- Check input size estimates: if actual
>> estimated, the optimizer picked under bad information. - Confirm the join condition is equality if you see hash or merge.
- Look for
Materialize,Sort,Hashchild nodes; they reveal hidden costs.
If you don't like the algorithm, the fix is usually (a) add an index, (b) change memory settings, or (c) fix statistics -- in that order.
Check Yourself
- Why must the hash table be built on the smaller input?
- Why is sort-merge useful even when hash join exists?
- When does an index nested loop beat a hash join on large inputs?
Mini Drill or Application
For each scenario, pick NLJ / index NLJ / hash join / sort-merge and justify. Assume 8 KB pages and 1 GB of available join memory.
orders (10^8)JOINusers (10^6)onuser_id, no suitable index.orders (100 rows, from a filter)JOINusers (10^6)onuser_id, PK index onusers.id.events (10^9)JOINsessions (10^7)onsession_id, both clustered bysession_id.A JOIN B ON A.x BETWEEN B.low AND B.high.A (10^7)JOINB (5*10^6)onid,|hash(B)| = 2 GBexceeds join memory.
Read This Only If Stuck
- Database System Concepts: 15.5 Join Operation (Part 1)
- Database System Concepts: 15.5 Join Operation (Part 2)
- Database System Concepts: 15.5 Join Operation (Part 4)
- Database System Concepts: 15.4 Sorting
- Database System Concepts: 15.2 Measures of Query Cost (Part 1)
- CMU 15-445 Intro to Database Systems: Joins Algorithms