Clustered vs Secondary Indexes; Covering Indexes
What This Concept Is
Not all indexes store the row. The kind of index determines where row data physically lives and what an index entry points at.
- Clustered index (also "primary-key-clustered table" in InnoDB, "index-organized table" in Oracle): the table is the B+-tree. Leaf pages hold entire rows, ordered by the index key. There is exactly one clustered index per table because the rows can only be physically ordered once.
- Secondary (non-clustered) index: a separate B+-tree whose leaf entries hold
(indexed_key, pointer_to_row). The pointer is either the RID or, in engines that move rows during reorganization, the primary key of the row. A lookup by secondary index reads the index leaf and then fetches the heap page. - Covering index: a secondary index that includes enough columns that the query can be answered from the index leaves alone, with no heap fetch. Sometimes achieved by adding columns to the index key, sometimes through
INCLUDE(PostgreSQL, SQL Server) which stores non-key payload columns in the leaf.
Each row in a table will typically be reached either by descending the clustered tree or by descending a secondary tree and then following a pointer into the clustered tree (or heap).
Why It Matters Here
Three practical consequences follow from this distinction:
- Extra-fetch problem: a secondary index lookup that matches
krows normally costskadditional random heap reads. For low-selectivity predicates that is catastrophic. - Covering index scan (aka index-only scan): if all projected columns live in the index, the heap fetches vanish. This is one of the most common real-world optimizations.
- Choice of primary key: the primary key decides how rows are physically ordered. A randomly distributed primary key (like a UUIDv4) causes inserts to land all over the clustered tree, producing random writes. A monotonically increasing key (serial, ULID, UUIDv7) produces appends to the rightmost leaf.
Every schema decision in Module 1 (Relational & SQL) becomes a physical decision here.
Concrete Example -- Clustered vs Secondary Lookup
Schema: CREATE TABLE orders (id BIGINT PK, user_id BIGINT, ts TIMESTAMP, amount NUMERIC, note TEXT). Primary key id is clustered. Secondary index idx_user_id on user_id.
Query A: SELECT * FROM orders WHERE id = 12345.
descend PK B+-tree -> leaf page holds the whole row (4 random reads, ~3 cached)
Query B: SELECT * FROM orders WHERE user_id = 42 matching 100 rows.
descend idx_user_id B+-tree -> leaf page holds (user_id, id) entries
range scan returns 100 (user_id, id) pairs
for each of the 100 ids:
descend PK B+-tree -> fetch heap row
That is 1 index leaf read + 100 clustered-tree point lookups. If the top levels of the PK tree are cached, each point lookup is a single random leaf read. 100 random reads. Bad.
Query C: SELECT id, user_id FROM orders WHERE user_id = 42.
Now only id and user_id are needed, and both live in the secondary index leaves. No heap fetch. 1 index leaf read per full-leaf-worth of matches. This is the index-only scan -- the same result without the extra 100 random reads.
Concrete Example -- Covering with INCLUDE
CREATE INDEX idx_user_amount ON orders (user_id) INCLUDE (amount, ts);
SELECT user_id, amount, ts FROM orders WHERE user_id = 42;
The INCLUDE columns live in the index leaves but are not part of the key, so they do not affect ordering. The query reads only index leaves. Cost drops dramatically for the common case of "look up many rows by user and project a few columns."
Common Confusion / Misconception
"A clustered index is always faster." Not for every access pattern. Range scans on the clustered key are fastest, because rows are physically adjacent. Range scans on a secondary key can be slower than a full scan if selectivity is poor. The clustered index is an investment in one specific access pattern; you only get to pick one per table.
"Adding a covering index is free." It costs write amplification: every INSERT and relevant UPDATE must now update the covering index too, and the index itself takes disk space. Covering indexes are a read optimization that you pay for on writes.
How To Use It
When designing an index for a specific query:
- Which columns appear in
WHERE/JOIN? Those usually become key columns. - Which columns are projected? If they all fit, add them as key tail or
INCLUDEcolumns. - Estimate the selectivity. If the predicate is very non-selective, no index helps.
- Evaluate the write-amplification cost: one more index per row on every insert and relevant update.
- If you find yourself reaching for a covering index very often, ask whether the clustering key is wrong.
Check Yourself
- Why can a secondary-index lookup of low-selectivity predicates be slower than a sequential scan?
- What exactly makes a scan "index-only," and why is that a big deal?
- Why does choice of primary key affect write throughput, and what is the pathology of a UUIDv4 primary key on an insert-heavy clustered table?
Mini Drill or Application
For each query, pick one: (a) clustered PK scan, (b) secondary index + heap fetch, (c) covering/index-only scan, (d) full scan. Briefly justify.
SELECT id FROM users WHERE email = ?with a secondary index onemail.SELECT * FROM users WHERE last_login > ?matching30%of rows.SELECT user_id, COUNT(*) FROM orders GROUP BY user_idwith a secondary index onuser_id.SELECT * FROM events WHERE id = ?with clustered PK onid.SELECT amount FROM orders WHERE user_id = ? AND ts > ?with an index on(user_id, ts) INCLUDE (amount).