Index Selection and Access Path Workshop
Retrieval Prompts
- State the B+-tree balance invariant and which operation is the only one that grows the tree in height.
- Give the height of a B+-tree as a function of fan-out and leaf count.
- State the rule for when an index-only scan is possible.
- Sketch the LSM-tree write path: where a write lands and when it reaches disk.
- State when a Bloom filter returns No and what that means for an LSM read.
Compare and Distinguish
Separate these pairs clearly:
- B+-tree vs LSM-tree under heavy random-key writes
- clustered vs secondary index lookup cost
- covering index vs index with
INCLUDEcolumns - hash index vs B+-tree for equality
- GIN vs GiST vs B+-tree for non-scalar column types
Common Mistake Check
For each statement, identify the error:
- "A secondary index always reduces I/O."
- "An LSM-tree has no random I/O."
- "A Bloom filter can give false negatives."
- "All indexes help range scans."
- "Adding more indexes only costs disk space, not write performance."
- "Bitmap indexes are best for high-cardinality columns."
Index Selection Drill
For each workload, select an index type and justify in one paragraph with specific costs:
- A clickstream table gets
10^5inserts/s, with reads mostly scanning(user_id, ts)in time order. - A
userstable is read byemailfor login and never scanned. - A
documentstable storesjsonbpayloads; filters often checkdata @> '{"tag": "..."}'. - An
eventstable has columnsstatus(4 values),severity(3 values),region(15 values). Queries areWHEREcombinations of any two. - A
placestable with(lat, lon)coordinates; queries find nearest neighbors. - A
transactionsPK-clustered table with random UUIDv4 keys experiences insert-throughput collapse as it grows.
B+-Tree Split Drill
Using an order-4 B+-tree (max 3 entries per leaf), insert the following in order. Draw the tree (ASCII) after every operation that causes a split or a merge:
10, 20, 30, 40, 5, 15, 25, 35, 45, 50, 55, 60, 65, 70
Then delete in order: 10, 20, 30, 40, 50. Draw after every underflow.
At the end, state the final tree height and the number of leaves.
LSM Compaction Worksheet
You have memtable size 32 MB, L0 compact-trigger 4 SSTables, L1 target 256 MB, 10x growth per level, sustained write rate 5 MB/s, average key-value size 512 B, 1% Bloom FPR per SSTable.
- How often does the memtable flush?
- How many SSTables accumulate in L0 before compaction?
- How large will L2 be when the database holds
~50 GBof data? - Expected number of random reads for a
GETof an absent key, with Bloom filters, assuming40SSTables exist. - If the same workload used a B+-tree instead, what changes in the write-side cost?
Evidence Check
This page is complete only if you can look at a new workload description and defend a choice of index structure in writing, with numbers, against at least one reasonable alternative.
Document Model Contrast
Model the restaurant reservation workflow twice: once relationally and once as documents.
For the document version:
- Design collections for guests, reservations, orders, and menu items.
- Decide what to embed and what to reference.
- Write CRUD examples for creating a reservation, adding an order item, changing payment status, and finding today's reservations.
- Add a basic authorization note: which roles may read or mutate each collection.
- Identify one query filter that needs an index and one access pattern that would be worse than the relational version.
Decision prompt: write a short relational-vs-document tradeoff note. Include consistency boundary, query shape, update frequency, and reporting needs.