Skip to main content

Module 2: Storage Engines & Indexing: Case Studies

These case studies make storage-engine behavior visible: pages, indexes, compaction, planner estimates, MVCC visibility, and WAL. Work through them with a notebook open. The artifact matters more than the reading.


How To Use These Case Studies

  1. Read the scenario and name the storage-engine concept.
  2. Sketch the physical access path before reading the better approach.
  3. Produce the required artifact.
  4. Connect the case to Module 1 SQL and to a later Semester 6 topic.
  5. Add a mistake-log entry for any hidden assumption you made.

Case Study 1: Covering Index That Still Hits The Heap

Scenario: A support dashboard frequently runs:

SELECT id, status, updated_at
FROM tickets
WHERE account_id = $1
AND status = 'open'
ORDER BY updated_at DESC
LIMIT 50;

A learner creates an index that appears to cover the query:

CREATE INDEX tickets_account_status_updated_idx
ON tickets(account_id, status, updated_at DESC)
INCLUDE (id);

The plan sometimes uses an index-only scan, but performance is inconsistent after heavy writes.

Source anchor: PostgreSQL's index-only scan docs explain that an index-only scan is physically possible when the index stores all columns needed by the query, but PostgreSQL must still verify MVCC visibility. It can skip heap access only when the heap page's all-visible bit is set in the visibility map. See PostgreSQL index-only scans and covering indexes and PostgreSQL visibility map.

Module concepts:

  • secondary indexes
  • covering indexes
  • heap fetches
  • MVCC visibility
  • visibility map
  • vacuum

Wrong Approach

"The index includes every selected column, so every lookup should be index-only."

This ignores visibility. PostgreSQL indexes do not store enough MVCC visibility information to prove every tuple is visible to the current transaction. The engine may need to visit the heap unless the visibility map says the page is all-visible.

Better Approach

Treat a covering index as a physical possibility, not a guarantee. Check:

  • whether the query references only indexed columns
  • whether the index type supports index-only scans
  • whether EXPLAIN (ANALYZE, BUFFERS) reports heap fetches
  • whether vacuum is maintaining the visibility map
  • whether the table is write-heavy enough to clear all-visible bits often

Example review note:

Expected access path:
Index Only Scan on tickets_account_status_updated_idx

Plan evidence:
Heap Fetches: high after write-heavy periods

Interpretation:
The index covers the columns, but the heap pages are not consistently all-visible.

Tradeoff Table

ChoiceGainCost
regular index scanworks on changing dataheap access can dominate latency
covering indexcan avoid heap readslarger index, write overhead, visibility still matters
aggressive vacuum tuningimproves all-visible coverageconsumes background I/O and needs monitoring
denormalized dashboard tablepredictable readsduplicate data and freshness contract

Failure Mode

The team adds wider and wider indexes because "index-only" sounds automatic, but the real blocker is visibility churn from updates.

Required Artifact

Run or mock an EXPLAIN (ANALYZE, BUFFERS) note with:

Index used:
Scan type:
Heap Fetches:
Buffer hits/reads:
Write rate of table:
Vacuum/autovacuum observation:
Decision:

Project / Capstone Connection

Any capstone dashboard that claims "fast query because covering index" must include plan evidence and a visibility/freshness story.


Case Study 2: Multicolumn Index Order And The Lost Sort

Scenario: A marketplace API needs a seller's recent active listings:

SELECT id, title, price_cents, created_at
FROM listings
WHERE seller_id = $1
AND status = 'active'
ORDER BY created_at DESC
LIMIT 100;

Two index proposals appear:

CREATE INDEX listings_created_seller_status_idx
ON listings(created_at DESC, seller_id, status);

CREATE INDEX listings_seller_status_created_idx
ON listings(seller_id, status, created_at DESC);

Both mention the same columns. Only one matches the access pattern well.

Source anchor: PostgreSQL's multicolumn-index docs explain that B-tree multicolumn indexes are most effective when constraints apply to leading columns. SQLite's query planner documentation gives a concrete explanation of multi-column and covering indexes, including searching and sorting with a multi-column index. See PostgreSQL multicolumn indexes and SQLite query planner.

Module concepts:

  • B+-tree ordering
  • leftmost prefix
  • equality predicates before range/order columns
  • searching and sorting through one access path

Wrong Approach

"All three columns are in the index, so order does not matter."

Order matters because the tree is sorted lexicographically. If created_at is first, the engine sees recent listings across all sellers first and then filters seller/status. If seller_id, status are first, the engine can seek to one seller's active range and then walk entries already ordered by created_at DESC.

Better Approach

Design the B-tree around the query's access path:

CREATE INDEX listings_seller_status_created_idx
ON listings(seller_id, status, created_at DESC)
INCLUDE (title, price_cents);

Physical intuition:

seller_id = 42
status = active
created_at DESC -> newest rows first

The engine can seek into the seller/status slice, read newest entries, and stop after LIMIT 100.

Tradeoff Table

ChoiceGainCost
(created_at, seller_id, status)global recent-feed style queryweak for one seller's active listings
(seller_id, status, created_at)strong filter plus order pathless useful for global latest listings
separate indexesflexible for varied predicatesmay require bitmap scan and separate sort
covering payload columnsfewer heap readslarger index and slower writes

Failure Mode

The query uses an index but still sorts or scans too many rows. The team sees "Index Scan" and stops reading the plan.

Required Artifact

Draw two B-tree key-order sketches and answer:

Which index can stop after LIMIT 100?
Which index preserves the requested ORDER BY?
Which index would you choose if the primary query changed to global newest active listings?

Project / Capstone Connection

Any project with feeds, dashboards, or admin lists needs an explicit "filter + order + limit" index design note.


Case Study 3: LSM Compaction Choice For A Write-Heavy Event Store

Scenario: An event ingestion service writes millions of immutable events per hour. Reads are mostly recent-user lookups and occasional range scans for replay. The team is choosing an LSM configuration and treats compaction as an implementation detail.

Source anchor: RocksDB's leveled-compaction wiki describes L0 files flushed from memtables and deeper levels as sorted runs. Its universal-compaction page describes the tradeoff: universal compaction targets lower write amplification while trading off read and space amplification. See RocksDB leveled compaction and RocksDB universal compaction.

Module concepts:

  • LSM trees
  • memtables and SSTables
  • leveled vs tiered/universal compaction
  • read amplification
  • write amplification
  • space amplification

Wrong Approach

"LSM is write-optimized, so compaction strategy does not matter."

An LSM shifts random writes into sequential writes, but compaction determines how many times data is rewritten, how many files a read may check, and how much duplicate/stale data occupies disk.

Better Approach

Choose compaction based on workload:

Workload:
very high write rate
mostly recent reads
bounded storage budget
occasional historical replay

Questions:
Can we tolerate higher read amplification?
Can Bloom filters reduce negative lookup cost?
Is write amplification the limiting resource?
What is the acceptable compaction backlog?

For a write-heavy event store, universal/tiered compaction may reduce write amplification, but the system must budget for higher read and space amplification. Leveled compaction may improve read predictability but rewrite data more aggressively.

Tradeoff Table

ChoiceGainCost
leveled compactionbetter read shape and space controlhigher write amplification
universal/tiered compactionlower write amplificationhigher read/space amplification
larger memtablesfewer flushesmore memory and recovery exposure
Bloom filtersfewer unnecessary SSTable readsmemory cost and false positives

Failure Mode

The ingestion system is benchmarked only on writes. In production, compaction backlog grows, read latency spikes, and disk usage surprises the team.

Required Artifact

Create a compaction decision table:

Write rate:
Read types:
Max read latency:
Max disk amplification:
Compaction strategy:
Bloom filter policy:
Metrics to watch:
Rollback / retune trigger:

Project / Capstone Connection

If the capstone includes logs, event streams, or time-series data, use this case to justify storage choice and retention strategy.


Case Study 4: Planner Statistics Pick The Wrong Join

Scenario: A SaaS app stores invoices. Most tenants have fewer than 1,000 invoices, but a few enterprise tenants have millions. A query joins invoices, payments, and customers for one tenant. The planner chooses a nested loop that works for small tenants and collapses for the largest one.

Source anchor: PostgreSQL's planner-statistics docs explain that the planner relies on statistics, histograms, most-common values, and row estimates. It also supports extended statistics for correlated columns, because per-column stats cannot always capture cross-column relationships. See PostgreSQL planner statistics.

Module concepts:

  • cost-based optimizer
  • histograms and most-common values
  • selectivity estimates
  • join algorithms
  • skew
  • extended statistics

Wrong Approach

"The query plan is bad, so force a different join everywhere."

That may help the one tenant and hurt the rest. The root issue might be estimates: the planner thinks the tenant filter is selective enough for nested loops, but tenant size is highly skewed.

Better Approach

Investigate estimates before forcing behavior:

EXPLAIN (ANALYZE, BUFFERS)
estimated rows vs actual rows
join type
loops count
rows removed by filter
buffer reads

Then consider:

  • increasing statistics target for skewed columns
  • extended statistics for correlated columns
  • partial indexes for enterprise tenant patterns
  • query shape changes that expose selectivity earlier
  • tenant-aware query paths only if evidence justifies them

Tradeoff Table

ChoiceGainCost
rely on default statssimple operationsweak under skew/correlation
higher stats targetbetter estimatesmore analyze time and catalog data
extended statscaptures correlationmust choose useful column groups
forced join settingquick diagnosticdangerous as permanent global policy
tenant-specific pathhandles skewapplication/query complexity

Failure Mode

The plan is judged by operator name instead of estimate quality. The same query has two workloads hiding behind one SQL shape.

Required Artifact

Write a plan diagnosis:

Estimated rows:
Actual rows:
Largest misestimate:
Join selected:
Join expected:
Statistic or query change proposed:
How to verify:

Project / Capstone Connection

Multi-tenant projects should include at least one skew scenario in query-plan testing.


Case Study 5: WAL Makes Commit Fast But Checkpoints Still Bite

Scenario: A write-heavy service shows periodic latency spikes. The team assumes every commit writes all changed table pages to disk, so they tune the wrong layer.

Source anchor: PostgreSQL's WAL docs explain that data-file changes are written only after WAL records describing them are flushed, allowing recovery to redo unapplied changes after a crash. WAL can reduce disk writes because the WAL is written sequentially and one WAL sync can cover many small transactions. PostgreSQL's WAL configuration docs explain checkpoints: dirty data pages are flushed and a checkpoint record tells crash recovery where redo should begin. See PostgreSQL WAL introduction and PostgreSQL WAL configuration.

Module concepts:

  • write-ahead logging
  • dirty pages
  • sequential log writes
  • checkpoints
  • redo recovery
  • latency spikes

Wrong Approach

"Every transaction must flush all changed pages, so commit latency is the table-write cost."

WAL changes the durability path. Commit can be durable once WAL is flushed; dirty data pages can reach disk later. But checkpoint behavior still matters because flushing many dirty pages can create I/O pressure.

Better Approach

Separate commit path from checkpoint path:

Commit:
write WAL record
flush WAL according to durability settings

Later:
background writer/checkpoint flushes dirty pages
crash recovery replays WAL from checkpoint redo point

Investigate:

  • WAL sync latency
  • checkpoint frequency
  • checkpoint write volume
  • dirty buffer pressure
  • storage IOPS saturation

Tradeoff Table

ChoiceGainCost
frequent checkpointsshorter crash recoverymore frequent write pressure
less frequent checkpointssmoother foreground writes sometimeslonger recovery and more WAL retained
asynchronous commitlower latencyrecent committed transactions may be lost on crash
faster storage for WALimproves commit pathdoes not eliminate checkpoint writeback

Failure Mode

The team tunes query indexes while the actual symptom is checkpoint I/O pressure.

Required Artifact

Draw a crash-recovery timeline:

T1 writes WAL
T2 commits
T3 dirty page not yet flushed
T4 checkpoint
T5 crash
T6 redo starts from checkpoint record

Then write which data is durable and why.

Project / Capstone Connection

If your capstone claims "durable writes," include a short note about the database durability path, backups, and recovery expectation.


Case Study 6: Bitmap Scan Versus Composite Index

Scenario: An analytics filter supports many combinations:

SELECT id
FROM events
WHERE country = 'PK'
AND device_type = 'mobile'
AND campaign_id = 123;

Some queries filter by all three fields; others use only one or two. The team debates one composite index for every common combination.

Source anchor: PostgreSQL's docs on combining indexes explain that the planner can scan multiple indexes, combine bitmaps with AND/OR, then visit table rows in physical order. It also notes the tradeoff: bitmap combination can help varied predicates, but ordering from the original indexes is lost and each extra index scan adds work. See PostgreSQL combining multiple indexes.

Module concepts:

  • bitmap index scans
  • composite indexes
  • physical row visitation
  • predicate combinations
  • sort loss

Wrong Approach

"Create every composite index users might need."

That creates write overhead, storage cost, and operational complexity. It may still not support sorting requirements.

Better Approach

Classify query families:

High-frequency, latency-sensitive, fixed predicate order:
consider composite index

Ad hoc filters, varied combinations:
separate indexes plus bitmap scan may be acceptable

Filter plus ORDER BY/LIMIT:
composite index may be needed to preserve order

Tradeoff Table

ChoiceGainCost
many composite indexesstrong for known query shapeswrite/storage explosion
separate indexes + bitmap scanflexible combinationsloses order and adds bitmap work
partial indexsmall and targetedonly works for matching predicate
BRIN or partitioninguseful for naturally ordered large tablesnot a general point-lookup index

Failure Mode

The index strategy optimizes a query builder demo, then slows every insert in production.

Required Artifact

Create an index portfolio table:

Query family:
Frequency:
Latency target:
Sort required:
Candidate index:
Write overhead:
Keep / reject:

Project / Capstone Connection

Admin search/filter pages in capstones should document which filters are first-class and which are best-effort.


Source Map

SourceUse it for
PostgreSQL index-only scans and covering indexescovering indexes, heap access, MVCC visibility requirement
PostgreSQL visibility mapall-visible pages and why index-only scans may still hit the heap
PostgreSQL multicolumn indexesleading-column behavior and composite index design
SQLite query plannerconcrete examples of multi-column and covering index behavior
RocksDB leveled compactionLSM levels, SSTables, and leveled layout
RocksDB universal compactionwrite/read/space amplification tradeoff in compaction strategy
PostgreSQL planner statisticshistograms, most-common values, extended statistics, misestimates
PostgreSQL WAL introductionWAL durability path and redo recovery
PostgreSQL WAL configurationcheckpoints and dirty-page flush behavior
PostgreSQL combining multiple indexesbitmap index scans and composite-vs-separate-index tradeoffs

Completion Standard

  • At least three case-study artifacts are completed.
  • At least one artifact includes EXPLAIN (ANALYZE, BUFFERS) or a mocked plan review.
  • At least one artifact includes a storage-layout diagram.
  • At least one artifact compares two index or compaction strategies.
  • At least one artifact connects storage-engine behavior to Module 3, 4, or 5 of Semester 6.