Reading an `EXPLAIN` / `EXPLAIN ANALYZE` Plan

What This Concept Is

EXPLAIN asks the database, "How do you plan to run this query?" The output is a tree of plan nodes. Each node takes input rows from its children, transforms them, and passes them up. EXPLAIN ANALYZE also executes the query and annotates each node with actual runtime and actual row counts.

Key plan nodes in PostgreSQL:

Seq Scan: read every row of a table. Cheap per row, expensive per query.
Index Scan: use an index to find rows, then fetch each from the heap.
Index Only Scan: answer the query from the index alone (requires all needed columns in the index).
Bitmap Index Scan + Bitmap Heap Scan: multi-predicate lookups that batch heap accesses.
Nested Loop Join: for each row on the left, probe the right. Great for small left × indexed right.
Hash Join: build a hash of one side, probe with the other. Great for medium-to-large sets with equality predicates.
Merge Join: both sides sorted, walk them together. Great when inputs are already sorted.
Sort, Aggregate, HashAggregate, Limit, Gather (parallel).

Each node shows:

cost=start..total: optimizer's estimate, in abstract units.
rows=N: estimated cardinality.
With ANALYZE: actual time=start..total rows=N loops=M.

Why It Matters Here

two queries with identical results can differ 1000× in cost because of plan differences
the optimizer chooses plans using statistics; when stats are stale or selectivity estimates are wrong, plans go sideways
EXPLAIN ANALYZE is how you prove an index is being used, confirm a suspected table rewrite, or diagnose a sudden latency spike

Concrete Example

EXPLAIN ANALYZE
SELECT o.id, c.name, o.total_cents
FROM orders o
JOIN customer c ON c.id = o.customer_id
WHERE c.email = 'alice@example.com'
ORDER BY o.placed_at DESC
LIMIT 10;

Abbreviated output:

 Limit  (cost=8.47..8.48 rows=10 width=...) (actual time=0.120..0.125 rows=10 loops=1)
   ->  Sort  (cost=8.47..8.50 rows=12 width=...)
             (actual time=0.118..0.121 rows=12 loops=1)
         Sort Key: o.placed_at DESC
         ->  Nested Loop  (cost=0.42..8.12 rows=12 width=...)
                          (actual time=0.035..0.095 rows=12 loops=1)
               ->  Index Scan using customer_email_key on customer c
                     (cost=0.28..2.29 rows=1 width=...)
                     (actual time=0.020..0.021 rows=1 loops=1)
                     Index Cond: (email = 'alice@example.com')
               ->  Index Scan using orders_customer_id_idx on orders o
                     (cost=0.14..5.71 rows=12 width=...)
                     (actual time=0.010..0.060 rows=12 loops=1)
                     Index Cond: (customer_id = c.id)
 Planning Time: 0.3 ms
 Execution Time: 0.2 ms

Read it as a tree (innermost child first): find the customer by email (index lookup, 1 row), for each such customer look up orders by customer_id (index, 12 rows), sort those 12, return top 10.

Warning signs to look for:

Seq Scan on a large table with a selective WHERE -> likely missing index.
rows=1 estimate vs actual rows=1,000,000 -> stats are lying, ANALYZE the table.
Sort + Limit at the top with huge rows -> sorting the entire table to return 10 rows.
Nested Loop with both sides large -> probably should be a hash join.
loops=N large on an inner node -> the same subtree is being re-executed N times (classic N+1 when N is the outer row count).

Common Confusion / Misconception

"EXPLAIN proved it is fast." EXPLAIN without ANALYZE only shows estimates. Always run EXPLAIN ANALYZE (or EXPLAIN (ANALYZE, BUFFERS)) on a representative dataset to see real behavior.

"Sequential scans are always bad." For small tables, or queries returning most rows, Seq Scan is faster than Index Scan. The optimizer's choice depends on table size and selectivity.

"If the index exists, it will be used." Predicates like WHERE lower(email) = 'a@b.com' or WHERE col + 1 = 10 defeat regular indexes. You need a matching expression index (CREATE INDEX ON t(lower(email))) or a rewritten predicate.

How To Use It

Run EXPLAIN (ANALYZE, BUFFERS) on the query.
Read from innermost node outward. Identify the node that dominates actual time.
Compare estimated rows to actual rows. Large mismatches indicate stale statistics.
Look for Seq Scan on big tables, Sort on unindexed orderings, and Nested Loop with large loops.
Change one thing (add index, rewrite WHERE, ANALYZE the table), re-explain, compare.

Check Yourself

What is the difference between Index Scan and Index Only Scan? What does PostgreSQL need to produce the latter?
A node shows rows=1 ... actual rows=1_200_000. What should you do?
Why might EXPLAIN ANALYZE report lower execution time than a production query feels like?

Mini Drill or Application

Create a pgexercises.com-style table or reuse orders:

Write a query that selects all orders for a customer by email.
Run EXPLAIN ANALYZE. Identify the plan.
Drop any index on customer.email. Re-explain. Note what changed.
Add an index on orders(customer_id, placed_at DESC). Re-explain the original. Note what changed.
Write up three bullet points: what plan changes, why, and whether actual time matched the estimate.

What This Concept Is​

Why It Matters Here​

Concrete Example​

Common Confusion / Misconception​

How To Use It​

Check Yourself​

Mini Drill or Application​

Read This Only If Stuck​