Skip to main content

Join Algorithms: Nested Loop, Hash, Sort-Merge

What This Concept Is

A join combines rows from two inputs R and S on a condition (usually equality on one column). Three algorithm families dominate. Every database on earth implements some subset.

Nested loop join (NLJ). For each row r in R, scan S looking for matches.

for r in R:
for s in S:
if match(r, s): emit(r, s)

Cost: |R| * |S| tuple comparisons, or |R| * pages(S) I/Os in the naive form. Very good when R is tiny or when S has an index on the join key (index nested loop join: lookup rather than scan).

Hash join (HJ). Build a hash table on the smaller input, then probe with the larger.

build phase: for s in S:  ht.insert(key(s), s)       # S is the smaller side
probe phase: for r in R: for s in ht.lookup(key(r)): emit(r, s)

Cost: |R| + |S| tuple operations, assuming the hash table fits in memory. If it does not, grace hash join partitions both sides into buckets that do fit, at the cost of writing partitions to disk.

Sort-merge join (SMJ). Sort both inputs on the join key, then walk them in tandem.

sort R by key
sort S by key
i = 0; j = 0
while i < |R| and j < |S|:
if R[i].key < S[j].key: i += 1
elif R[i].key > S[j].key: j += 1
else: emit all matches for this key group; advance i or j

Cost dominated by the sorts: O(|R| log |R| + |S| log |S|). But if either input is already sorted (common after a clustered-index scan or a prior sort), the algorithm becomes cheap.

Why It Matters Here

Join algorithm choice is where most query performance lives. A three-way join over 10^6 rows on each side could be 10^12 tuple comparisons with nested loops, 3 * 10^6 with hash joins, or zero sort cost with sort-merge if the inputs arrive sorted. Three decisions by the optimizer can span six orders of magnitude.

It is also the topic where the cost model has the most leverage (next concept): picking the wrong join algorithm is the single most common cause of slow query plans.

When Each Wins

ConditionPreferred joinReason
Either input is very small (< a few thousand rows)Nested loop (broadcast inner)Avoid hash build overhead
Inner side has an index on the join keyIndex nested loopEach outer row is one indexed lookup
Both sides large, equi-join, enough memoryHash join`O(
Memory too small for hash tableGrace hash joinPartition to fit
Inputs already sorted (or need to be)Sort-mergeZero or cheap sort; also produces sorted output
Non-equi join (<, <=, range)Nested loop or special range-joinHash/merge need equality

Concrete Example -- Three Algorithms on the Same Query

SELECT o.id, u.name FROM orders o JOIN users u ON o.user_id = u.id

  • orders has 10^7 rows, users has 10^5.
  • users.id is primary key (clustered B+-tree).

NLJ naive: 10^7 * 10^5 = 10^12 comparisons. Terrible.

Index NLJ: for each order, one B+-tree lookup by user_id. 10^7 index lookups, say 10^7 random-ish reads, heavily cached. This is actually common in OLTP engines for small outers.

Hash join: build hash table on users (the smaller side) -- about 10^5 entries, fits easily in memory. Probe with orders. 10^7 + 10^5 ~= 10^7 tuple operations. One sequential scan each side. Usually the winner here.

Sort-merge: sort both by user_id. Sorting 10^7 rows isn't free, but if orders is already clustered by user_id, the sort vanishes on that side. Useful in analytical engines or when the next operator also wants sorted input.

Common Confusion / Misconception

"Hash join always beats nested loop." Not when the hash-table build exceeds the work an index nested loop would do. For a tiny outer (100 rows) against an indexed inner, index NLJ is typically 10x faster. Optimizers know this; users who expect "hash always wins" are the most common source of bad hints.

"Sort-merge is obsolete." It is the algorithm of choice for: (1) joins that also need sorted output (window functions, ORDER BY on join key), (2) joins where both inputs arrive sorted, and (3) range joins on sorted data. Any analytical engine worth its salt still uses it.

How To Use It

Reading EXPLAIN output:

  1. Identify the join operator (Nested Loop, Hash Join, Merge Join).
  2. Identify which side is the build/inner vs probe/outer -- the smaller side should be the build.
  3. Check input size estimates: if actual >> estimated, the optimizer picked under bad information.
  4. Confirm the join condition is equality if you see hash or merge.
  5. Look for Materialize, Sort, Hash child nodes; they reveal hidden costs.

If you don't like the algorithm, the fix is usually (a) add an index, (b) change memory settings, or (c) fix statistics -- in that order.

Check Yourself

  1. Why must the hash table be built on the smaller input?
  2. Why is sort-merge useful even when hash join exists?
  3. When does an index nested loop beat a hash join on large inputs?

Mini Drill or Application

For each scenario, pick NLJ / index NLJ / hash join / sort-merge and justify. Assume 8 KB pages and 1 GB of available join memory.

  1. orders (10^8) JOIN users (10^6) on user_id, no suitable index.
  2. orders (100 rows, from a filter) JOIN users (10^6) on user_id, PK index on users.id.
  3. events (10^9) JOIN sessions (10^7) on session_id, both clustered by session_id.
  4. A JOIN B ON A.x BETWEEN B.low AND B.high.
  5. A (10^7) JOIN B (5*10^6) on id, |hash(B)| = 2 GB exceeds join memory.

Read This Only If Stuck