Query Performance Clinic
Retrieval Prompts
- What information does
EXPLAINshow thatEXPLAIN ANALYZEdoes not, and vice versa? - State in one sentence the meaning of
cost=start..total,rows, andactual time=.... - State in one sentence what makes a predicate "sargable."
- State one reason the planner might choose
Seq Scanover an existingIndex Scan. - State one reason the planner might estimate
rows=1when the actual isrows=1_000_000.
Diagnose These Plans
For each plan fragment, name the specific pathology in one sentence and propose one concrete fix.
Plan A
Seq Scan on orders (cost=0.00..234_567.89 rows=87 width=...)
(actual time=812..821 rows=87 loops=1)
Filter: (customer_id = 42)
Rows Removed by Filter: 9_999_913
Plan B
Nested Loop (actual time=... rows=10_000 loops=1)
-> Seq Scan on customer c (actual rows=10_000 loops=1)
-> Index Scan using orders_customer_id_idx on orders o
(actual time=0.002..0.015 rows=1 loops=10_000)
What is this pattern called when it appears at the application layer, and what is the single-query rewrite?
Plan C
Sort (cost=... rows=50_000_000 width=...)
(actual time=22000..22500 rows=10 loops=1)
Sort Key: placed_at DESC
Sort Method: external merge Disk: 1_800_000 kB
-> Seq Scan on orders (actual rows=50_000_000 loops=1)
Limit (actual rows=10 loops=1)
Plan D
Index Scan using orders_customer_id_idx on orders
(cost=... rows=1 width=...)
(actual time=... rows=1_200_000 loops=1)
Index Cond: (customer_id = 7)
Optimize a Slow Query
Given this query against orders (50M rows) and customer (10M rows):
SELECT c.name, o.id, o.total_cents, o.placed_at
FROM orders o
JOIN customer c ON c.id = o.customer_id
WHERE lower(c.email) = 'alice@example.com'
AND o.placed_at >= now() - INTERVAL '30 days'
ORDER BY o.placed_at DESC
LIMIT 20;
Tasks:
- Write the
EXPLAIN (ANALYZE, BUFFERS)you would run. - List three independent hypotheses about why the query could be slow.
- For each hypothesis, write the specific
CREATE INDEXor rewrite that would test it, and the expected plan change. - Choose the single change you would ship first. Justify in two sentences.
N+1 Rewrite
Given pseudocode that loads a paginated list of orders and then fetches each order's items:
orders = db.query("SELECT * FROM orders WHERE placed_at > ? ORDER BY placed_at DESC LIMIT 50", since)
for o in orders:
o.items = db.query("SELECT * FROM order_item WHERE order_id = ?", o.id)
Rewrite it as:
- One SQL query that returns all rows in one pass (hint:
LEFT JOIN). - One SQL query that batches the child fetch (hint:
WHERE order_id = ANY($1)). - Explain which rewrite is preferable and under what circumstance.
Evidence Check
This page is complete only if, for any EXPLAIN ANALYZE output from a real query, you can:
- read it as a tree and identify the dominant node
- name the plan-level pathology (wrong join type, bad estimate, accidental scan)
- propose a concrete fix (index, rewrite, stats refresh, schema change)
- predict the resulting plan before running it