Skip to main content

The Cost of a Bad Query: Scan vs Seek, N+1 Queries

What This Concept Is

Two specific failure modes dominate real-world SQL performance complaints:

Scan vs. seek. A seek uses an index to jump straight to the rows of interest (O(log N) for a B-tree). A scan reads every row of a table or index (O(N)). Scans are not inherently bad -- they are the right choice when you truly need most rows -- but when a query returning ten rows out of ten million produces a scan, you have a missing or unused index, and the cost is orders of magnitude higher than it should be.

Causes of accidental scans:

  • no index exists for the predicate
  • the index exists but the predicate is non-sargable: WHERE lower(email) = ?, WHERE col + 1 = 10, WHERE date_part('year', ts) = 2024
  • leading columns of a composite index are not used (INDEX(a,b), query says WHERE b = ?)
  • types mismatch (col is bigint, parameter bound as text -> implicit cast forces scan)
  • OR across different columns (often better rewritten as UNION ALL)
  • statistics are stale, so the planner estimates the scan is cheaper than it actually is

N+1 queries. A single logical request expands into 1 query to fetch parent rows plus N per-row queries to fetch children. The symptom is that the database is "fast" (each query takes 1 ms) but the user-facing request is slow (120 queries × 1 ms = 120 ms), and load scales linearly with result size. Common source: ORM lazy loading iterating over a parent collection.

Why It Matters Here

  • scan-vs-seek is where EXPLAIN literacy pays off: you can see in the plan whether a seek or scan is happening
  • N+1 is invisible to EXPLAIN on any individual query; it only shows up in application traces and slow-log aggregation
  • both failure modes have a tiny, easily-measured fix, but require you to notice the pattern; this concept is the pattern recognition lens

Concrete Example

Scan that should be a seek. orders has 10 M rows; we want a single customer's orders.

EXPLAIN ANALYZE
SELECT * FROM orders WHERE customer_id = 42;
-- Seq Scan on orders (cost=0.00..234567.89 rows=100 width=...)
-- (actual time=812.1..821.7 rows=87 loops=1)
-- Filter: (customer_id = 42)
-- Rows Removed by Filter: 9999913

Fix:

CREATE INDEX ON orders(customer_id);
EXPLAIN ANALYZE SELECT * FROM orders WHERE customer_id = 42;
-- Index Scan using orders_customer_id_idx on orders
-- (cost=0.43..400.71 rows=100 width=...)
-- (actual time=0.035..0.210 rows=87 loops=1)

Non-sargable predicate.

SELECT * FROM users WHERE lower(email) = 'alice@example.com';
-- Seq Scan (the btree on email is useless for lower(email))

Fix: match the index or rewrite:

CREATE INDEX users_email_lower ON users (lower(email));
-- or
SELECT * FROM users WHERE email ILIKE 'alice@example.com'; -- in PG, still tricky

N+1 in practice (pseudo-ORM):

customers = db.query("SELECT * FROM customer WHERE region = 'EU'")
for c in customers:
orders = db.query("SELECT * FROM orders WHERE customer_id = ?", c.id) # 1 per loop

Fix: one JOIN, or one batched WHERE customer_id = ANY($1):

SELECT c.*, o.*
FROM customer c
LEFT JOIN orders o ON o.customer_id = c.id
WHERE c.region = 'EU'
ORDER BY c.id;

Common Confusion / Misconception

"Adding more indexes always helps." Every index costs space and slows writes (inserts, updates, deletes must maintain it). Indexes must pay for themselves in query wins. Drop unused ones.

"N+1 is an ORM problem." It is an access pattern problem. A raw SQL codebase that loops over results and re-queries has the same pathology. ORMs make it easier to trip into, not the cause.

"A fast query cannot be a bad query." A query that runs in 2 ms against 10 K rows may run in 20 s at 10 M. Always test against a production-shaped dataset.

How To Use It

For every new query you write or touch in a hot path:

  1. Run EXPLAIN ANALYZE. Check for Seq Scan on large tables with selective predicates.
  2. Inspect the predicate. If it wraps an indexed column (lower(col), col + 1), either rewrite or add an expression index.
  3. If an index exists and is not being used, check column types and statistics (ANALYZE tablename).
  4. For any loop in application code that issues a query per iteration, rewrite as one join or one batched IN/ANY query. Log the query count per request; alert when it grows with input size.

Check Yourself

  1. Explain in one sentence what "sargable" means and give one example of a non-sargable predicate and its fix.
  2. A B-tree index on (region, created_at). Which of these can use it for a seek: WHERE region = ?, WHERE created_at > ?, WHERE region = ? AND created_at > ?? Why?
  3. Describe an N+1 you have seen. What would be the single-query equivalent?

Mini Drill or Application

Using the customer / orders schema:

  1. Load 10k customers and 1M orders.
  2. Write a query that returns the latest order per customer. Run EXPLAIN ANALYZE.
  3. Rewrite using a window function and again with a correlated subquery. Compare plans and timings.
  4. Write the anti-pattern N+1 version in pseudocode. Show the single-query equivalent and measure the query count difference.

Read This Only If Stuck