Index Selection and Access Path Workshop

Retrieval Prompts

State the B+-tree balance invariant and which operation is the only one that grows the tree in height.
Give the height of a B+-tree as a function of fan-out and leaf count.
State the rule for when an index-only scan is possible.
Sketch the LSM-tree write path: where a write lands and when it reaches disk.
State when a Bloom filter returns No and what that means for an LSM read.

Compare and Distinguish

Separate these pairs clearly:

B+-tree vs LSM-tree under heavy random-key writes
clustered vs secondary index lookup cost
covering index vs index with INCLUDE columns
hash index vs B+-tree for equality
GIN vs GiST vs B+-tree for non-scalar column types

Common Mistake Check

For each statement, identify the error:

"A secondary index always reduces I/O."
"An LSM-tree has no random I/O."
"A Bloom filter can give false negatives."
"All indexes help range scans."
"Adding more indexes only costs disk space, not write performance."
"Bitmap indexes are best for high-cardinality columns."

Index Selection Drill

For each workload, select an index type and justify in one paragraph with specific costs:

A clickstream table gets 10^5 inserts/s, with reads mostly scanning (user_id, ts) in time order.
A users table is read by email for login and never scanned.
A documents table stores jsonb payloads; filters often check data @> '{"tag": "..."}'.
An events table has columns status (4 values), severity (3 values), region (15 values). Queries are WHERE combinations of any two.
A places table with (lat, lon) coordinates; queries find nearest neighbors.
A transactions PK-clustered table with random UUIDv4 keys experiences insert-throughput collapse as it grows.

B+-Tree Split Drill

Using an order-4 B+-tree (max 3 entries per leaf), insert the following in order. Draw the tree (ASCII) after every operation that causes a split or a merge:

10, 20, 30, 40, 5, 15, 25, 35, 45, 50, 55, 60, 65, 70

Then delete in order: 10, 20, 30, 40, 50. Draw after every underflow.

At the end, state the final tree height and the number of leaves.

LSM Compaction Worksheet

You have memtable size 32 MB, L0 compact-trigger 4 SSTables, L1 target 256 MB, 10x growth per level, sustained write rate 5 MB/s, average key-value size 512 B, 1% Bloom FPR per SSTable.

How often does the memtable flush?
How many SSTables accumulate in L0 before compaction?
How large will L2 be when the database holds ~50 GB of data?
Expected number of random reads for a GET of an absent key, with Bloom filters, assuming 40 SSTables exist.
If the same workload used a B+-tree instead, what changes in the write-side cost?

Evidence Check

This page is complete only if you can look at a new workload description and defend a choice of index structure in writing, with numbers, against at least one reasonable alternative.

Document Model Contrast

Model the restaurant reservation workflow twice: once relationally and once as documents.

For the document version:

Design collections for guests, reservations, orders, and menu items.
Decide what to embed and what to reference.
Write CRUD examples for creating a reservation, adding an order item, changing payment status, and finding today's reservations.
Add a basic authorization note: which roles may read or mutate each collection.
Identify one query filter that needs an index and one access pattern that would be worse than the relational version.

Decision prompt: write a short relational-vs-document tradeoff note. Include consistency boundary, query shape, update frequency, and reporting needs.

Retrieval Prompts​

Compare and Distinguish​

Common Mistake Check​

Index Selection Drill​

B+-Tree Split Drill​

LSM Compaction Worksheet​

Evidence Check​

Document Model Contrast​