Module 2: Storage Engines & Indexing: Case Studies

These case studies make storage-engine behavior visible: pages, indexes, compaction, planner estimates, MVCC visibility, and WAL. Work through them with a notebook open. The artifact matters more than the reading.

How To Use These Case Studies

Read the scenario and name the storage-engine concept.
Sketch the physical access path before reading the better approach.
Produce the required artifact.
Connect the case to Module 1 SQL and to a later Semester 6 topic.
Add a mistake-log entry for any hidden assumption you made.

Case Study 1: Covering Index That Still Hits The Heap

Scenario: A support dashboard frequently runs:

SELECT id, status, updated_at
FROM tickets
WHERE account_id = $1
  AND status = 'open'
ORDER BY updated_at DESC
LIMIT 50;

A learner creates an index that appears to cover the query:

CREATE INDEX tickets_account_status_updated_idx
ON tickets(account_id, status, updated_at DESC)
INCLUDE (id);

The plan sometimes uses an index-only scan, but performance is inconsistent after heavy writes.

Source anchor: PostgreSQL's index-only scan docs explain that an index-only scan is physically possible when the index stores all columns needed by the query, but PostgreSQL must still verify MVCC visibility. It can skip heap access only when the heap page's all-visible bit is set in the visibility map. See PostgreSQL index-only scans and covering indexes and PostgreSQL visibility map.

Module concepts:

secondary indexes
covering indexes
heap fetches
MVCC visibility
visibility map
vacuum

Wrong Approach

"The index includes every selected column, so every lookup should be index-only."

This ignores visibility. PostgreSQL indexes do not store enough MVCC visibility information to prove every tuple is visible to the current transaction. The engine may need to visit the heap unless the visibility map says the page is all-visible.

Better Approach

Treat a covering index as a physical possibility, not a guarantee. Check:

whether the query references only indexed columns
whether the index type supports index-only scans
whether EXPLAIN (ANALYZE, BUFFERS) reports heap fetches
whether vacuum is maintaining the visibility map
whether the table is write-heavy enough to clear all-visible bits often

Example review note:

Expected access path:
  Index Only Scan on tickets_account_status_updated_idx

Plan evidence:
  Heap Fetches: high after write-heavy periods

Interpretation:
  The index covers the columns, but the heap pages are not consistently all-visible.

Tradeoff Table

Choice	Gain	Cost
regular index scan	works on changing data	heap access can dominate latency
covering index	can avoid heap reads	larger index, write overhead, visibility still matters
aggressive vacuum tuning	improves all-visible coverage	consumes background I/O and needs monitoring
denormalized dashboard table	predictable reads	duplicate data and freshness contract

Failure Mode

The team adds wider and wider indexes because "index-only" sounds automatic, but the real blocker is visibility churn from updates.

Required Artifact

Run or mock an EXPLAIN (ANALYZE, BUFFERS) note with:

Index used:
Scan type:
Heap Fetches:
Buffer hits/reads:
Write rate of table:
Vacuum/autovacuum observation:
Decision:

Project / Capstone Connection

Any capstone dashboard that claims "fast query because covering index" must include plan evidence and a visibility/freshness story.

Case Study 2: Multicolumn Index Order And The Lost Sort

Scenario: A marketplace API needs a seller's recent active listings:

SELECT id, title, price_cents, created_at
FROM listings
WHERE seller_id = $1
  AND status = 'active'
ORDER BY created_at DESC
LIMIT 100;

Two index proposals appear:

CREATE INDEX listings_created_seller_status_idx
ON listings(created_at DESC, seller_id, status);

CREATE INDEX listings_seller_status_created_idx
ON listings(seller_id, status, created_at DESC);

Both mention the same columns. Only one matches the access pattern well.

Source anchor: PostgreSQL's multicolumn-index docs explain that B-tree multicolumn indexes are most effective when constraints apply to leading columns. SQLite's query planner documentation gives a concrete explanation of multi-column and covering indexes, including searching and sorting with a multi-column index. See PostgreSQL multicolumn indexes and SQLite query planner.

Module concepts:

B+-tree ordering
leftmost prefix
equality predicates before range/order columns
searching and sorting through one access path

Wrong Approach

"All three columns are in the index, so order does not matter."

Order matters because the tree is sorted lexicographically. If created_at is first, the engine sees recent listings across all sellers first and then filters seller/status. If seller_id, status are first, the engine can seek to one seller's active range and then walk entries already ordered by created_at DESC.

Better Approach

Design the B-tree around the query's access path:

CREATE INDEX listings_seller_status_created_idx
ON listings(seller_id, status, created_at DESC)
INCLUDE (title, price_cents);

Physical intuition:

seller_id = 42
  status = active
    created_at DESC -> newest rows first

The engine can seek into the seller/status slice, read newest entries, and stop after LIMIT 100.

Tradeoff Table

Choice	Gain	Cost
`(created_at, seller_id, status)`	global recent-feed style query	weak for one seller's active listings
`(seller_id, status, created_at)`	strong filter plus order path	less useful for global latest listings
separate indexes	flexible for varied predicates	may require bitmap scan and separate sort
covering payload columns	fewer heap reads	larger index and slower writes

Failure Mode

The query uses an index but still sorts or scans too many rows. The team sees "Index Scan" and stops reading the plan.

Required Artifact

Draw two B-tree key-order sketches and answer:

Which index can stop after LIMIT 100?
Which index preserves the requested ORDER BY?
Which index would you choose if the primary query changed to global newest active listings?

Project / Capstone Connection

Any project with feeds, dashboards, or admin lists needs an explicit "filter + order + limit" index design note.

Case Study 3: LSM Compaction Choice For A Write-Heavy Event Store

Scenario: An event ingestion service writes millions of immutable events per hour. Reads are mostly recent-user lookups and occasional range scans for replay. The team is choosing an LSM configuration and treats compaction as an implementation detail.

Source anchor: RocksDB's leveled-compaction wiki describes L0 files flushed from memtables and deeper levels as sorted runs. Its universal-compaction page describes the tradeoff: universal compaction targets lower write amplification while trading off read and space amplification. See RocksDB leveled compaction and RocksDB universal compaction.

Module concepts:

LSM trees
memtables and SSTables
leveled vs tiered/universal compaction
read amplification
write amplification
space amplification

Wrong Approach

"LSM is write-optimized, so compaction strategy does not matter."

An LSM shifts random writes into sequential writes, but compaction determines how many times data is rewritten, how many files a read may check, and how much duplicate/stale data occupies disk.

Better Approach

Choose compaction based on workload:

Workload:
  very high write rate
  mostly recent reads
  bounded storage budget
  occasional historical replay

Questions:
  Can we tolerate higher read amplification?
  Can Bloom filters reduce negative lookup cost?
  Is write amplification the limiting resource?
  What is the acceptable compaction backlog?

For a write-heavy event store, universal/tiered compaction may reduce write amplification, but the system must budget for higher read and space amplification. Leveled compaction may improve read predictability but rewrite data more aggressively.

Tradeoff Table

Choice	Gain	Cost
leveled compaction	better read shape and space control	higher write amplification
universal/tiered compaction	lower write amplification	higher read/space amplification
larger memtables	fewer flushes	more memory and recovery exposure
Bloom filters	fewer unnecessary SSTable reads	memory cost and false positives

Failure Mode

The ingestion system is benchmarked only on writes. In production, compaction backlog grows, read latency spikes, and disk usage surprises the team.

Required Artifact

Create a compaction decision table:

Write rate:
Read types:
Max read latency:
Max disk amplification:
Compaction strategy:
Bloom filter policy:
Metrics to watch:
Rollback / retune trigger:

Project / Capstone Connection

If the capstone includes logs, event streams, or time-series data, use this case to justify storage choice and retention strategy.

Case Study 4: Planner Statistics Pick The Wrong Join

Scenario: A SaaS app stores invoices. Most tenants have fewer than 1,000 invoices, but a few enterprise tenants have millions. A query joins invoices, payments, and customers for one tenant. The planner chooses a nested loop that works for small tenants and collapses for the largest one.

Source anchor: PostgreSQL's planner-statistics docs explain that the planner relies on statistics, histograms, most-common values, and row estimates. It also supports extended statistics for correlated columns, because per-column stats cannot always capture cross-column relationships. See PostgreSQL planner statistics.

Module concepts:

cost-based optimizer
histograms and most-common values
selectivity estimates
join algorithms
skew
extended statistics

Wrong Approach

"The query plan is bad, so force a different join everywhere."

That may help the one tenant and hurt the rest. The root issue might be estimates: the planner thinks the tenant filter is selective enough for nested loops, but tenant size is highly skewed.

Better Approach

Investigate estimates before forcing behavior:

EXPLAIN (ANALYZE, BUFFERS)
  estimated rows vs actual rows
  join type
  loops count
  rows removed by filter
  buffer reads

Then consider:

increasing statistics target for skewed columns
extended statistics for correlated columns
partial indexes for enterprise tenant patterns
query shape changes that expose selectivity earlier
tenant-aware query paths only if evidence justifies them

Tradeoff Table

Choice	Gain	Cost
rely on default stats	simple operations	weak under skew/correlation
higher stats target	better estimates	more analyze time and catalog data
extended stats	captures correlation	must choose useful column groups
forced join setting	quick diagnostic	dangerous as permanent global policy
tenant-specific path	handles skew	application/query complexity

Failure Mode

The plan is judged by operator name instead of estimate quality. The same query has two workloads hiding behind one SQL shape.

Required Artifact

Write a plan diagnosis:

Estimated rows:
Actual rows:
Largest misestimate:
Join selected:
Join expected:
Statistic or query change proposed:
How to verify:

Project / Capstone Connection

Multi-tenant projects should include at least one skew scenario in query-plan testing.

Case Study 5: WAL Makes Commit Fast But Checkpoints Still Bite

Scenario: A write-heavy service shows periodic latency spikes. The team assumes every commit writes all changed table pages to disk, so they tune the wrong layer.

Source anchor: PostgreSQL's WAL docs explain that data-file changes are written only after WAL records describing them are flushed, allowing recovery to redo unapplied changes after a crash. WAL can reduce disk writes because the WAL is written sequentially and one WAL sync can cover many small transactions. PostgreSQL's WAL configuration docs explain checkpoints: dirty data pages are flushed and a checkpoint record tells crash recovery where redo should begin. See PostgreSQL WAL introduction and PostgreSQL WAL configuration.

Module concepts:

write-ahead logging
dirty pages
sequential log writes
checkpoints
redo recovery
latency spikes

Wrong Approach

"Every transaction must flush all changed pages, so commit latency is the table-write cost."

WAL changes the durability path. Commit can be durable once WAL is flushed; dirty data pages can reach disk later. But checkpoint behavior still matters because flushing many dirty pages can create I/O pressure.

Better Approach

Separate commit path from checkpoint path:

Commit:
  write WAL record
  flush WAL according to durability settings

Later:
  background writer/checkpoint flushes dirty pages
  crash recovery replays WAL from checkpoint redo point

Investigate:

WAL sync latency
checkpoint frequency
checkpoint write volume
dirty buffer pressure
storage IOPS saturation

Tradeoff Table

Choice	Gain	Cost
frequent checkpoints	shorter crash recovery	more frequent write pressure
less frequent checkpoints	smoother foreground writes sometimes	longer recovery and more WAL retained
asynchronous commit	lower latency	recent committed transactions may be lost on crash
faster storage for WAL	improves commit path	does not eliminate checkpoint writeback

Failure Mode

The team tunes query indexes while the actual symptom is checkpoint I/O pressure.

Required Artifact

Draw a crash-recovery timeline:

T1 writes WAL
T2 commits
T3 dirty page not yet flushed
T4 checkpoint
T5 crash
T6 redo starts from checkpoint record

Then write which data is durable and why.

Project / Capstone Connection

If your capstone claims "durable writes," include a short note about the database durability path, backups, and recovery expectation.

Case Study 6: Bitmap Scan Versus Composite Index

Scenario: An analytics filter supports many combinations:

SELECT id
FROM events
WHERE country = 'PK'
  AND device_type = 'mobile'
  AND campaign_id = 123;

Some queries filter by all three fields; others use only one or two. The team debates one composite index for every common combination.

Source anchor: PostgreSQL's docs on combining indexes explain that the planner can scan multiple indexes, combine bitmaps with AND/OR, then visit table rows in physical order. It also notes the tradeoff: bitmap combination can help varied predicates, but ordering from the original indexes is lost and each extra index scan adds work. See PostgreSQL combining multiple indexes.

Module concepts:

bitmap index scans
composite indexes
physical row visitation
predicate combinations
sort loss

Wrong Approach

"Create every composite index users might need."

That creates write overhead, storage cost, and operational complexity. It may still not support sorting requirements.

Better Approach

Classify query families:

High-frequency, latency-sensitive, fixed predicate order:
  consider composite index

Ad hoc filters, varied combinations:
  separate indexes plus bitmap scan may be acceptable

Filter plus ORDER BY/LIMIT:
  composite index may be needed to preserve order

Tradeoff Table

Choice	Gain	Cost
many composite indexes	strong for known query shapes	write/storage explosion
separate indexes + bitmap scan	flexible combinations	loses order and adds bitmap work
partial index	small and targeted	only works for matching predicate
BRIN or partitioning	useful for naturally ordered large tables	not a general point-lookup index

Failure Mode

The index strategy optimizes a query builder demo, then slows every insert in production.

Required Artifact

Create an index portfolio table:

Query family:
Frequency:
Latency target:
Sort required:
Candidate index:
Write overhead:
Keep / reject:

Project / Capstone Connection

Admin search/filter pages in capstones should document which filters are first-class and which are best-effort.

Source Map

Source	Use it for
PostgreSQL index-only scans and covering indexes	covering indexes, heap access, MVCC visibility requirement
PostgreSQL visibility map	all-visible pages and why index-only scans may still hit the heap
PostgreSQL multicolumn indexes	leading-column behavior and composite index design
SQLite query planner	concrete examples of multi-column and covering index behavior
RocksDB leveled compaction	LSM levels, SSTables, and leveled layout
RocksDB universal compaction	write/read/space amplification tradeoff in compaction strategy
PostgreSQL planner statistics	histograms, most-common values, extended statistics, misestimates
PostgreSQL WAL introduction	WAL durability path and redo recovery
PostgreSQL WAL configuration	checkpoints and dirty-page flush behavior
PostgreSQL combining multiple indexes	bitmap index scans and composite-vs-separate-index tradeoffs

Completion Standard

At least three case-study artifacts are completed.
At least one artifact includes EXPLAIN (ANALYZE, BUFFERS) or a mocked plan review.
At least one artifact includes a storage-layout diagram.
At least one artifact compares two index or compaction strategies.
At least one artifact connects storage-engine behavior to Module 3, 4, or 5 of Semester 6.

How To Use These Case Studies​

Case Study 1: Covering Index That Still Hits The Heap​

Wrong Approach​

Better Approach​

Tradeoff Table​

Failure Mode​

Required Artifact​

Project / Capstone Connection​

Case Study 2: Multicolumn Index Order And The Lost Sort​

Wrong Approach​

Better Approach​

Tradeoff Table​

Failure Mode​

Required Artifact​

Project / Capstone Connection​

Case Study 3: LSM Compaction Choice For A Write-Heavy Event Store​

Wrong Approach​

Better Approach​

Tradeoff Table​

Failure Mode​

Required Artifact​

Project / Capstone Connection​

Case Study 4: Planner Statistics Pick The Wrong Join​

Wrong Approach​

Better Approach​

Tradeoff Table​

Failure Mode​

Required Artifact​

Project / Capstone Connection​

Case Study 5: WAL Makes Commit Fast But Checkpoints Still Bite​

Wrong Approach​

Better Approach​

Tradeoff Table​

Failure Mode​

Required Artifact​

Project / Capstone Connection​

Case Study 6: Bitmap Scan Versus Composite Index​

Wrong Approach​

Better Approach​

Tradeoff Table​

Failure Mode​

Required Artifact​

Project / Capstone Connection​

Source Map​

Completion Standard​

How To Use These Case Studies

Case Study 1: Covering Index That Still Hits The Heap

Wrong Approach

Better Approach

Tradeoff Table

Failure Mode

Required Artifact

Project / Capstone Connection

Case Study 2: Multicolumn Index Order And The Lost Sort

Wrong Approach

Better Approach

Tradeoff Table

Failure Mode

Required Artifact

Project / Capstone Connection

Case Study 3: LSM Compaction Choice For A Write-Heavy Event Store

Wrong Approach

Better Approach

Tradeoff Table

Failure Mode

Required Artifact

Project / Capstone Connection

Case Study 4: Planner Statistics Pick The Wrong Join

Wrong Approach

Better Approach

Tradeoff Table

Failure Mode

Required Artifact

Project / Capstone Connection

Case Study 5: WAL Makes Commit Fast But Checkpoints Still Bite

Wrong Approach

Better Approach

Tradeoff Table

Failure Mode

Required Artifact

Project / Capstone Connection

Case Study 6: Bitmap Scan Versus Composite Index

Wrong Approach

Better Approach

Tradeoff Table

Failure Mode

Required Artifact

Project / Capstone Connection

Source Map

Completion Standard