Module Quiz
Complete this quiz after finishing all concept and practice pages.
Current Module Questions
Question 1: ACID Property Mapping
Which of A, C, I, D is most directly supported by each mechanism: (a) the write-ahead log; (b) a foreign-key constraint; (c) row-level write locks; (d) an fsync on COMMIT; (e) a CHECK constraint?
Answer:
- (a) WAL: A and D (atomicity via log-based undo/redo; durability via log-first rule).
- (b) Foreign key: C (declared integrity constraint).
- (c) Row-level write locks: I.
- (d) fsync on commit: D.
- (e) CHECK constraint: C.
Question 2: ARIES Recovery
During recovery from this log, what does ARIES redo and undo?
10 T1 BEGIN
11 T1 UPDATE x: 5 -> 7
12 CHECKPOINT (active: T1; data file x=5)
13 T2 BEGIN
14 T2 UPDATE y: 2 -> 4
15 T1 COMMIT
16 T3 BEGIN
17 T3 UPDATE z: 8 -> 9
-- crash --
Answer:
- Analysis: identifies T1 as committed (has COMMIT), T2 and T3 as in-flight (no COMMIT).
- Redo: replays LSN 11, 14, 17 against data files (to reach state at crash time).
- Undo: rolls back T2 (LSN 14 -> writes y back to 2) and T3 (LSN 17 -> writes z back to 8). Writes CLRs for both undo operations.
- Final state: x = 7 (T1 committed), y = 2 (T2 rolled back), z = 8 (T3 rolled back).
Question 3: Anomaly Identification
A transaction T_total reads account A (balance 500), then account B (balance 200), and returns the sum. Concurrently, T_xfer transfers 100 from A to B. T_total returns 600 when the true sum should be 700. Name the anomaly and the weakest isolation level that prevents it.
Answer: Read skew (aka non-repeatable read). T_total saw A before the transfer and B after. Prevented by Snapshot Isolation (PostgreSQL REPEATABLE READ) or stronger, because a snapshot would show both balances at the same point in time.
Question 4: Anomaly Identification
Two concurrent transactions each execute:
SELECT COUNT(*) FROM doctors WHERE on_call; -- both return 2
UPDATE doctors SET on_call = FALSE WHERE name = 'self';
Both commit under PostgreSQL REPEATABLE READ. The "at least one on-call" invariant is now violated. Name the anomaly and the weakest prevention.
Answer: Write skew. Prevented by PostgreSQL SERIALIZABLE (SSI). Alternative: in RR, use SELECT ... FOR UPDATE on the doctors rows so the second transaction blocks or sees the updated count.
Question 5: Isolation-Level Choice
A workflow reads a user's current profile, validates a change, and writes the new version. Two admins sometimes edit the same profile concurrently. Choose the weakest PostgreSQL isolation level that prevents lost updates without over-serializing, and name an alternative primitive.
Answer: REPEATABLE READ (SI) is sufficient: first-committer-wins triggers 40001 on the second writer, who must retry. Alternative: Read Committed plus UPDATE profiles SET ... WHERE id = $id AND version = $expected (compare-and-set) with retry.
Question 6: Anomaly Identification
An inventory system sold the last item twice because two checkouts both read quantity = 1, both decremented, and both committed. Name the anomaly and give two distinct preventions.
Answer: Lost update.
- Prevention 1: atomic
UPDATE inventory SET quantity = quantity - 1 WHERE id = ? AND quantity >= 1(compare-and-set), check affected rows == 1. - Prevention 2:
SELECT ... FOR UPDATEon the inventory row at the start of the checkout transaction. - Prevention 3: raise isolation to
REPEATABLE READ(SI) and retry on40001.
Question 7: Anomaly Identification
Two users sign up simultaneously with the same email. Each transaction ran SELECT 1 FROM users WHERE email = ? (no row), then INSERT INTO users (email, ...). Both succeeded under READ COMMITTED. Name the anomaly and two preventions.
Answer: Phantom. Preventions:
- Declare a
UNIQUEconstraint onemail; the second INSERT fails with an integrity error. - Run both transactions at
SERIALIZABLE; SSI detects the rw-antidependency and aborts one. - Use an explicit advisory lock keyed by the email during the check-and-insert.
Question 8: Isolation-Level Choice
A dashboard issues a read-only query computing totals, revenue, and counts across many tables. Latency matters; stale-by-a-few-seconds is fine. Choose the right isolation/read strategy on PostgreSQL.
Answer: READ COMMITTED with a single transaction so each statement sees committed data. Or REPEATABLE READ if the dashboard must report a consistent snapshot across the multiple queries. Do not use SERIALIZABLE (unnecessary aborts on a read-heavy workload). For stale-tolerant loads, reading from a read replica is the better architectural answer.
Question 9: 2PL vs SI Pick
For a workload of 80% short writes, 20% long reads, which concurrency-control scheme reduces contention: 2PL or SI/MVCC? Why?
Answer: SI/MVCC. Long reads in 2PL hold shared locks that block writers. In SI, readers use a snapshot and never block writers (or vice versa), so write throughput is preserved. The cost is MVCC storage overhead and potential bloat from long-held snapshots.
Question 10: 2PC Failure Mode
In 2PC, a participant has replied PREPARED and the coordinator then crashes before sending COMMIT or ABORT. What must the participant do, and why can it not unilaterally abort?
Answer: The participant is in doubt. It must hold its locks and wait for the coordinator to recover and tell it the decision. It cannot unilaterally abort because the coordinator may have already logged and sent COMMIT to another participant; aborting here would violate atomicity (one participant commits, one aborts). This is 2PC's famous blocking behavior.
Question 11: Saga Design
For a travel-booking workflow (reserve flight, reserve hotel, charge card), write the compensations for each step and identify any that resist compensation.
Answer:
- Compensation for "reserve flight": release flight reservation.
- Compensation for "reserve hotel": release hotel reservation.
- Compensation for "charge card": issue refund (a new ledger entry; original charge record remains for audit).
All three are semantically compensable. Risks: the airline/hotel may have non-refundable policies that prevent release; the refund may take days to appear on the customer's statement. A "send itinerary email" step, if added, would resist compensation because emails cannot be unsent.
Question 12: Linearizability Analysis
Is this history linearizable? C1: write(x, 1) (t=1-2); C2: read(x) -> 0 (t=3-4); C3: read(x) -> 1 (t=5-6); C4: read(x) -> 0 (t=7-8).
Answer: No. C3 read -> 1 at t=5-6 and C4 read -> 0 at t=7-8. A linearizable register cannot return to an older value after a newer one was observed. Once the system exhibits x=1 to C3, no subsequent read may observe x=0. Violation: real-time monotonicity on the register.
Question 13: Causal vs Eventual
Give a concrete anomaly that causal consistency prevents but eventual consistency allows.
Answer: Alice posts a question at replica R1. Bob, reading R2, sees the question (after propagation), replies. On replica R3, Carol sees Bob's reply arrive before Alice's original question (because messages took different paths and there is no ordering guarantee). Under causal consistency, R3 would not deliver Bob's reply until R3 has Alice's question (because Bob's reply causally depends on Alice's post). Under plain eventual, this out-of-order observation is legal.
Question 14: CAP and PACELC
A vendor says "we provide strong consistency and high availability." Apply CAP and PACELC honestly. What question do you ask to force clarity?
Answer: CAP tells us the vendor cannot simultaneously offer linearizable consistency and full availability on both sides of a partition. The honest question: "What happens on the minority side during a cross-region partition? What happens to read latency under strong consistency in steady state (PACELC)?" The answer reveals whether they are CP/EC (Spanner-like, high-latency), PA/EL (DynamoDB-like, stale reads possible), or marketing.
Question 15: Isolation-Level Choice
An audit job reads the entire transactions table once and writes an aggregated report to another table. Concurrent writers are constantly inserting new transactions. What isolation level produces a consistent snapshot without aborting the audit, and what is the cost?
Answer: REPEATABLE READ (SI) on the audit job produces a consistent snapshot of transactions at BEGIN time. The audit will not see inserts after BEGIN, which is correct for a consistent snapshot. The cost: MVCC bloat grows while the audit is running because old row versions cannot be vacuumed until the audit's snapshot is released.
Interleaved Review Questions
Prior Module Question 1 (S6 Module 3: Replication)
What is the difference between synchronous and asynchronous replication, and which ACID property is most affected by the choice?
Answer: Synchronous replication: the primary waits for at least one replica to acknowledge the write before confirming commit. Asynchronous: primary commits without waiting. The choice most directly affects Durability under primary failure: async replication can lose committed writes if the primary fails before replication completes. Synchronous replication converts "durable on primary" into "durable on a quorum," at cost of latency.
Prior Module Question 2 (S6 Module 2: Storage Engines)
Why is a log-structured merge-tree (LSM) write-optimized compared to a B-tree?
Answer: LSMs append to an in-memory memtable then flush to immutable sorted files; writes are sequential I/O. B-trees update pages in place, requiring random I/O on update, plus WAL to recover in-place mutations atomically. On write-heavy workloads the sequential-write discipline of LSMs is faster; reads pay a merge cost (bloom filters and compaction mitigate this).
Prior Module Question 3 (S6 Module 1: SQL)
Explain why declarative UPDATE counter SET n = n + 1 WHERE id = $id is usually better than SELECT n ... / UPDATE n = $computed for concurrency.
Answer: The atomic form executes a single statement where the read and write happen under the engine's internal concurrency control for that row, with no gap between the read and the write the application could interleave into. The read-modify-write version opens a gap visible to other transactions, which can lose updates even under Read Committed. Always prefer expressing the update declaratively when the engine can compute the new value.
Prior Module Question 4 (S5 Module 3: OS memory)
Why does an fsync on a database commit not fully guarantee durability in the presence of a hardware write cache?
Answer: fsync asks the OS to flush writes to the drive, but a drive with a volatile write cache may acknowledge the flush without persisting. On power loss, the cached writes are lost, and the database's "durability" claim is based on a lie. Production-grade durability requires battery-backed caches, supercapacitor-protected SSDs, or explicit cache flush commands the drive honors.
Prior Module Question 5 (S5 Module 4: Networks)
In a data-center network, why is "message delay is unbounded" a more accurate assumption than "message delay is bounded with high probability"?
Answer: In asynchronous networks, queueing at switches, GC pauses on endpoints, and transient packet loss can produce delays orders of magnitude larger than the median. Any protocol that assumes a bound (e.g., 3PC) can misbehave under tail latencies the 99.99th-percentile does not capture. Safe protocols (Paxos, Raft) assume only that messages eventually arrive, not that they arrive within any specific time.
Self-Assessment and Remediation
Mastery Level (90-100% correct):
- Ready for Module 5 (distributed systems fundamentals) with confidence.
Proficient Level (75-89% correct):
- Review missed concepts, redo one practice lab for each missed area.
Developing Level (60-74% correct):
- Redo Practice 1 (anomalies lab) end to end. The gap is almost always in the anomaly-recognition reflex.
Insufficient Level (<60% correct):
- Restart with Cluster 2 (anomalies). Most likely issue: pattern-matching isolation levels without simulating the interleaving. Work every anomaly by hand on paper before touching code.