Reading an EXPLAIN / EXPLAIN ANALYZE Plan
What This Concept Is
EXPLAIN asks the database, "How do you plan to run this query?" The output is a tree of plan nodes. Each node takes input rows from its children, transforms them, and passes them up. EXPLAIN ANALYZE also executes the query and annotates each node with actual runtime and actual row counts.
Key plan nodes in PostgreSQL:
Seq Scan: read every row of a table. Cheap per row, expensive per query.Index Scan: use an index to find rows, then fetch each from the heap.Index Only Scan: answer the query from the index alone (requires all needed columns in the index).Bitmap Index Scan+Bitmap Heap Scan: multi-predicate lookups that batch heap accesses.Nested Loop Join: for each row on the left, probe the right. Great for small left × indexed right.Hash Join: build a hash of one side, probe with the other. Great for medium-to-large sets with equality predicates.Merge Join: both sides sorted, walk them together. Great when inputs are already sorted.Sort,Aggregate,HashAggregate,Limit,Gather(parallel).
Each node shows:
cost=start..total: optimizer's estimate, in abstract units.rows=N: estimated cardinality.- With
ANALYZE:actual time=start..total rows=N loops=M.
Why It Matters Here
- two queries with identical results can differ 1000× in cost because of plan differences
- the optimizer chooses plans using statistics; when stats are stale or selectivity estimates are wrong, plans go sideways
EXPLAIN ANALYZEis how you prove an index is being used, confirm a suspected table rewrite, or diagnose a sudden latency spike
Concrete Example
EXPLAIN ANALYZE
SELECT o.id, c.name, o.total_cents
FROM orders o
JOIN customer c ON c.id = o.customer_id
WHERE c.email = 'alice@example.com'
ORDER BY o.placed_at DESC
LIMIT 10;
Abbreviated output:
Limit (cost=8.47..8.48 rows=10 width=...) (actual time=0.120..0.125 rows=10 loops=1)
-> Sort (cost=8.47..8.50 rows=12 width=...)
(actual time=0.118..0.121 rows=12 loops=1)
Sort Key: o.placed_at DESC
-> Nested Loop (cost=0.42..8.12 rows=12 width=...)
(actual time=0.035..0.095 rows=12 loops=1)
-> Index Scan using customer_email_key on customer c
(cost=0.28..2.29 rows=1 width=...)
(actual time=0.020..0.021 rows=1 loops=1)
Index Cond: (email = 'alice@example.com')
-> Index Scan using orders_customer_id_idx on orders o
(cost=0.14..5.71 rows=12 width=...)
(actual time=0.010..0.060 rows=12 loops=1)
Index Cond: (customer_id = c.id)
Planning Time: 0.3 ms
Execution Time: 0.2 ms
Read it as a tree (innermost child first): find the customer by email (index lookup, 1 row), for each such customer look up orders by customer_id (index, 12 rows), sort those 12, return top 10.
Warning signs to look for:
Seq Scanon a large table with a selectiveWHERE-> likely missing index.rows=1estimate vsactual rows=1,000,000-> stats are lying,ANALYZEthe table.Sort+Limitat the top with hugerows-> sorting the entire table to return 10 rows.Nested Loopwith both sides large -> probably should be a hash join.loops=Nlarge on an inner node -> the same subtree is being re-executed N times (classic N+1 when N is the outer row count).
Common Confusion / Misconception
"EXPLAIN proved it is fast." EXPLAIN without ANALYZE only shows estimates. Always run EXPLAIN ANALYZE (or EXPLAIN (ANALYZE, BUFFERS)) on a representative dataset to see real behavior.
"Sequential scans are always bad." For small tables, or queries returning most rows, Seq Scan is faster than Index Scan. The optimizer's choice depends on table size and selectivity.
"If the index exists, it will be used." Predicates like WHERE lower(email) = 'a@b.com' or WHERE col + 1 = 10 defeat regular indexes. You need a matching expression index (CREATE INDEX ON t(lower(email))) or a rewritten predicate.
How To Use It
- Run
EXPLAIN (ANALYZE, BUFFERS)on the query. - Read from innermost node outward. Identify the node that dominates
actual time. - Compare estimated
rowsto actualrows. Large mismatches indicate stale statistics. - Look for
Seq Scanon big tables,Sorton unindexed orderings, andNested Loopwith large loops. - Change one thing (add index, rewrite WHERE,
ANALYZEthe table), re-explain, compare.
Check Yourself
- What is the difference between
Index ScanandIndex Only Scan? What does PostgreSQL need to produce the latter? - A node shows
rows=1 ... actual rows=1_200_000. What should you do? - Why might
EXPLAIN ANALYZEreport lower execution time than a production query feels like?
Mini Drill or Application
Create a pgexercises.com-style table or reuse orders:
- Write a query that selects all orders for a customer by email.
- Run
EXPLAIN ANALYZE. Identify the plan. - Drop any index on
customer.email. Re-explain. Note what changed. - Add an index on
orders(customer_id, placed_at DESC). Re-explain the original. Note what changed. - Write up three bullet points: what plan changes, why, and whether actual time matched the estimate.
Read This Only If Stuck
- Silberschatz: Part six - query processing and optimization
- Silberschatz: Query processing overview (15.1)
- Silberschatz: Measures of query cost, part 1
- Silberschatz: Transformation of relational expressions
- Silberschatz: Choice of evaluation plans, part 1
- PostgreSQL docs: Using EXPLAIN
- Use The Index, Luke: execution plans