Module Quiz
Complete this quiz after finishing all concept and practice pages. Answer in writing, in full sentences, before checking against the model answers.
Current Module Questions
Question 1: Random vs Sequential I/O
Why does a database treat a random 8 KB read and a sequential 8 KB read as qualitatively different costs, even on an SSD?
Answer: Random reads typically pay per-request overhead (seek on HDD; controller queueing, internal erase-block effects, and lack of prefetch benefit on SSD), while sequential reads amortize that overhead across many contiguous bytes and benefit from prefetching. Even on SSDs, the per-IOP cost is real and sequential access permits large transfer sizes and better bandwidth utilization.
Question 2: Slotted Page Access
An 8 KB page holds 40 variable-length records in a slotted layout. An UPDATE grows one record by 200 bytes. Describe the two possible outcomes and what each does to the record's RID.
Answer: If free space is sufficient, the record is relocated within the page (slot array updated, RID preserved). If not, the page is reorganized or a forwarding pointer redirects the original slot to a new page; the RID is preserved but reads now cost two page fetches. A split/move without a forwarding pointer is never done because any secondary index pointing at the RID would break.
Question 3: Buffer Pool
With a buffer pool of 1000 frames and pin_count > 0 on 980 frames, a query needs a page that is not resident. What happens and why?
Answer: The replacement policy must evict a page, but it can only evict frames with pin_count = 0. With only 20 evictable frames, the engine evicts one (dirty pages flushed first) and reads the requested page in. If all frames are pinned, the request blocks or fails -- a sign of pin leaks or excessive concurrency.
Question 4: B+-Tree Height (Access-Path Analysis)
A B+-tree has fan-out 200 and 10^9 keys. What is its approximate height, and how many page reads does a point lookup cost on a cold cache?
Answer: Height = ceil(log_200(10^9)) ~ 4. A cold point lookup reads 4 pages for internal/leaf traversal, plus one heap page if the index is secondary, for a total of 5 pages. Warm cache: internal pages almost always resident, so the marginal cost is 1-2 I/Os.
Question 5: B+-Tree Split
Starting from an order-4 B+-tree leaf containing [10, 20, 30], inserting 25 causes a split. Draw the leaves before and after and state the separator promoted to the parent.
Answer:
Before: [10, 20, 30].
Insert 25: overflow -> split halves. Left leaf: [10, 20], right leaf: [25, 30]. Separator promoted: 25 (first key of right leaf for a B+-tree). Parent gains a pointer to the new right leaf.
Question 6: Clustered vs Secondary (Access-Path Analysis)
A users table is clustered by id. A query is SELECT name, email FROM users WHERE email = ?. You add a secondary B+-tree on email. Describe the full access path and its page cost, then describe the path if the index is a covering index on (email) INCLUDE (name, email) vs (email, name).
Answer: Secondary-only: walk email B+-tree (~3-4 page reads) -> get RID -> fetch row page from clustered heap (1 more page read). Total ~4-5. Covering/index-only: walk index once and return without the heap fetch (~3-4 total). The index must contain all projected columns; (email) INCLUDE (name) works because Postgres permits non-key payloads; the alternative (email, name) adds name to the key, which is wasteful if name is not used for ordering or filtering.
Question 7: LSM Write Path
Describe every place a single write goes in an LSM engine before the caller returns, and explain why the write does not wait for SSTable I/O.
Answer: Write is appended to the WAL (sequential append, fsync for durability) and inserted into the in-memory memtable. The caller returns. Asynchronously, when the memtable is full, it is flushed to an immutable SSTable at L0. Compaction merges L0 into lower levels later. No foreground write touches an SSTable because SSTables are immutable and rewriting them on each write would destroy throughput.
Question 8: Bloom Filter Arithmetic
A Bloom filter with m = 10^5 bits and k = 7 hash functions has been populated with n = 10^4 keys. What is the approximate false-positive probability? If your LSM read needs to check 40 SSTables for a nonexistent key, how many SSTables do you expect to actually read on disk?
Answer: FPR ~ (1 - e^(-kn/m))^k = (1 - e^(-0.7))^7 ~ 0.503^7 ~ 0.008 ~ 0.8%. Expected SSTables actually accessed: 40 * 0.008 ~ 0.3, i.e., almost always 0.
Question 9: Specialty Index Selection
A jsonb column is filtered by containment: WHERE data @> '{"status": "open"}'. Which index type fits and why does a B+-tree fail here?
Answer: A GIN index. B+-tree orders by a single comparable key and cannot express "this key-value is contained in this document." GIN stores postings for each extracted token and can answer containment by intersecting postings lists.
Question 10: Execution Model
When does a vectorized engine outperform a volcano iterator, and when does it fail to?
Answer: Vectorized wins on analytic workloads that process many rows per operator, because batching amortizes per-tuple overhead and enables SIMD and tight cache-friendly loops. It loses or ties for tiny result sets (dominated by planning and initial fetch) and for plans with cheap selective predicates that a well-optimized iterator handles with negligible overhead.
Question 11: Join Algorithms
A hash join is chosen on two tables A (10^7 rows) and B (10^3 rows). The build side is B. The build hash table does not fit in memory. What plan does the engine fall back to, and why might it prefer to switch algorithms entirely?
Answer: It falls back to Grace hash join, which partitions both sides by hash and joins partitions that fit in memory. If partitioning cost dominates or the inputs are already sorted by the join key, sort-merge may be cheaper. If B has an index on the join key and A is filtered very selectively, an index nested loop may actually be cheapest.
Question 12: Statistics and Cost Estimation
A query planner predicts 10 rows for a filtered subquery; the real count is 10^6. Name two specific errors that commonly cause this kind of miss, and one consequence for the chosen join algorithm.
Answer: (1) Correlated predicates treated as independent (multiplied selectivities understate the result). (2) Stale or missing statistics, including stale histograms, a low statistics target, or multi-column statistics not defined. Consequence: the planner may pick nested loop with the underestimated side as the outer, turning an O(n) join into O(n * m) during execution.
Question 13: 2PL and Deadlock
Two transactions each touch rows in the opposite order under strict 2PL and deadlock. Who aborts, and can the application retry safely?
Answer: The deadlock detector (or a timeout) aborts the victim (usually the younger or the one holding the fewest locks, depending on policy). The aborted transaction is rolled back atomically. Because strict 2PL holds locks until commit/abort, the retry sees no partial state from either transaction. Retries are safe if the application is prepared to re-execute the transaction.
Question 14: MVCC Visibility
Under snapshot isolation, explain why a long-running read transaction may see version V1 of a row while a concurrent short transaction sees version V2, and under what condition V1 becomes eligible for garbage collection.
Answer: The read transaction captured its snapshot at start; it sees the version that was committed before its snapshot. The short transaction's snapshot is newer and includes the later update. V1 is eligible for garbage collection only when no live transaction's snapshot can still see it -- i.e., the oldest active xmin has advanced past the version's xmax.
Question 15: WAL and Recovery
A transaction writes a record, appends a commit log record, and crashes before the data page reaches disk. What does recovery do, and what would happen if the engine had no WAL?
Answer: Recovery reads the log, identifies committed transactions, and redoes their page updates using the log's after-images, then undoes any uncommitted losers. The durability guarantee survives because the log reached stable storage before commit was acknowledged. Without WAL, the update is lost: the data page never reached disk, and the only record of it -- the in-memory page -- is gone.
Interleaved Review Questions
Prior Module Question 1
What is the difference between a logical read and a physical read?
Answer: A logical read is a request for a page by the buffer pool; a physical read is a logical read that missed cache and had to go to disk. All physical reads are logical reads; not vice versa.
Prior Module Question 2
Why is a table scan sometimes faster than an index scan?
Answer: When predicate selectivity is low (many rows qualify), a sequential scan is cheaper than an index scan that issues many random I/Os via RIDs. The crossover depends on page density and how many RIDs become random reads.
Prior Module Question 3
What is the essential difference between a B-tree and a B+-tree?
Answer: A B+-tree stores values only in leaves and links leaves in a list; internal nodes hold only keys used for routing. This makes range scans efficient and keeps internal fan-out maximally high.
Prior Module Question 4
Why does doubling the memtable size typically improve LSM write amplification?
Answer: A larger memtable produces larger L0 SSTables, which reduces the ratio of rewrites needed to push data down into lower levels. Overall write amplification decreases, at the cost of more memory and slower recovery (more WAL to replay).
Prior Module Question 5
What does the WAL rule guarantee that naive file writes do not?
Answer: That any data-page change is recoverable because the log record describing it is already durable. A crash at any point leaves enough log information to reconstruct committed state and roll back uncommitted state.
Self-Assessment and Remediation
Mastery Level (90-100% correct):
- Ready to move to Module 3 (Replication & Partitioning).
Proficient Level (75-89% correct):
- Redo practice pages 2 and 3, and the joins/statistics concepts.
Developing Level (60-74% correct):
- Re-read Cluster 2 (B-tree) and Cluster 5 (concurrency), redo all katas, and build the
EXPLAINreading log.
Insufficient Level (<60% correct):
- Return to the concept sequence starting at Cluster 1 and build the storage-engine reasoning habit before advancing.