Module Quiz
Complete this quiz after finishing all concept and practice pages.
Current Module Questions
Question 1: Three Reasons to Replicate
Name three distinct goals replication can serve, and for each, name a workload where that goal dominates.
Answer:
- Availability: one replica answers when another is down. Workload: banking system with strict uptime SLA.
- Read throughput: many replicas serve read-heavy load in parallel. Workload: analytics dashboard.
- Geolocation: replicas near users reduce latency. Workload: globally distributed SaaS with users in four continents.
Question 2: Partition-Key Choice
You have 1 billion users. Why is range-partitioning by user_id likely a bad idea, and what are two better alternatives?
Answer: If user IDs are assigned monotonically, all new-user writes land on the tail partition -- a permanent write hotspot. Better: (a) hash-partition by user_id; (b) use a random-prefix scheme ({hash_prefix}:{user_id}) if you need some locality; (c) assign IDs in a scheme that disperses insertion order (UUIDv4, snowflake with node-ID prefix).
Question 3: CAP Under Partition
A two-datacenter cluster using single-leader synchronous replication suffers a network partition between the DCs. The leader is in DC-A; a follower is in DC-B. Describe what happens to reads and writes in each DC under (a) CP configuration, (b) AP configuration.
Answer:
- CP: writes in DC-A continue as long as local durability is satisfied but fail if synchronous acknowledgment from DC-B was required. Reads in DC-B go stale (or refuse to serve) since they cannot contact the leader. Some transactions block.
- AP: DC-B promotes its follower to a second leader. Both DCs accept writes for their local clients. When the partition heals, two divergent histories must be merged. Possibly data loss or manual reconciliation.
Question 4: Replication Log Format
Why can MySQL binlog_format=STATEMENT silently corrupt a replica when a statement contains NOW() or RAND()?
Answer: Statement-based replication replays the same SQL on the replica. A non-deterministic function returns a different value on the replica, so the computed data diverges. Row-based replication sidesteps this because the leader's computed row values are transmitted, not the SQL text.
Question 5: Sync vs Async Durability
Under async replication, the leader commits writes locally and returns success immediately. Under semi-sync with one required replica, the leader waits for one replica to ack before returning success. Quantify the data-loss window for each when the leader crashes.
Answer:
- Async: the window equals the current replication lag -- whatever uncommitted-on-replica writes existed on the leader at crash time. Typically milliseconds to seconds; arbitrarily long under load.
- Semi-sync with one required replica: the window is zero as long as the sync replica has not also crashed. If the sync replica has also crashed, the system either refuses writes (correct) or silently degrades to async (trap), so the effective window depends on configuration.
Question 6: Read-Your-Writes Scenario
A user posts a comment and refreshes the page; the comment is intermittently missing. Async replicas lag 200 ms. Name the anomaly and prescribe the minimum mechanism that fixes it.
Answer: Read-your-writes violation. Minimum fix: after a write, capture the leader's LSN and send it with subsequent reads from the same user; the follower delays the read until replayed_lsn >= given_lsn. Alternative but more expensive: route all same-user reads to the leader for a short time window after a write.
Question 7: Quorum Choice
A Cassandra cluster has N=5. For each (W, R), say whether reads see the latest write (assuming no failures) and whether the cluster tolerates one replica failure for both reads and writes:
(W=3, R=3)(W=5, R=1)(W=1, R=1)(W=1, R=5)
Answer:
(3, 3):W + R = 6 > 5, so reads see latest writes. Tolerates 2 failures for writes, 2 for reads. Balanced.(5, 1):W + R = 6 > 5, consistent. Zero tolerance for write failures (all 5 required). Full tolerance for read failures.(1, 1):W + R = 2 ≤ 5, NOT consistent. Fast but stale.(1, 5):W + R = 6 > 5, consistent. Full tolerance on writes. Zero tolerance for read failures.
Question 8: Secondary-Index Partitioning
Given a users collection sharded by user_id, a query SELECT * FROM users WHERE email = ? requires a scatter-gather in MongoDB. Explain why, and describe what changes if the index is global (term-partitioned) instead.
Answer: MongoDB builds secondary indexes locally per shard. The router does not know which shard holds the target email, so it fans out the query to every shard and merges results. Cost scales with shard count. A term-partitioned (global) index would locate the email in one index shard (via hash(email) or range), allowing the router to ask only that shard, with one follow-up to fetch the row on its owning data shard. Cost: writes become cross-shard because each row update must touch the index's owning shard.
Question 9: Hotspot Analysis
A hash-partitioned social-feed database has 99 mostly-idle partitions and one partition at 95% CPU. Investigation shows the hot partition stores posts for a single celebrity account with 50M followers. Why doesn't adding more partitions fix this, and what does?
Answer: Hash-partitioning spreads distinct keys across partitions. One key cannot be spread by the hash function -- it lives in exactly one partition. The fix is application-level:
- Artificially split the celebrity by creating
celebrity:user_id:shard_nkeys and fanning writes across them. - Cache the hot key in-memory at the routing tier.
- Use a read-through CDN for the celebrity's public content. None of those require touching the database's partition scheme; all require the application to know that this key is special.
Question 10: Rebalancing Scheme
Why is consistent hashing preferred over hash(key) mod N for a cluster where the node count changes over time?
Answer: With hash(key) mod N, changing N changes the modulus, so almost every key is remapped to a different partition. Rebalancing moves an O(N) fraction of the dataset. Consistent hashing assigns both keys and nodes to positions on a ring; adding or removing one node moves only roughly 1/N of the keys (those between the new/removed node and its predecessor). The qualitative difference is linear data movement vs proportional data movement per topology change.
Question 11: Failover Scenario (failure -> reads/writes)
A Postgres cluster uses Patroni with an etcd cluster holding the "who is the leader" key as a TTL'd lock. The primary loses its network to etcd but remains reachable from some application clients. Describe what happens to reads and writes on the primary and on the promoted replica.
Answer:
- Primary's etcd lease expires; Patroni on the primary self-demotes to read-only (a crucial safety feature).
- Replicas' Patroni instances see the lease is free; they elect the replica with the highest LSN. That replica promotes to primary; Patroni writes the new lock in etcd.
- Application clients that were connected to the old primary now see read-only errors on attempted writes. Clients connected via the Patroni-aware proxy are redirected to the new primary.
- Writes resume on the new primary; reads on the old primary continue to work (stale).
- Any un-replicated writes on the old primary at demotion time are lost.
- Because Patroni used a distributed lock, split-brain is prevented: the old primary cannot serve writes without the etcd lock.
Question 12: Failover Scenario (split-brain risk)
Describe a replication/failover setup where split-brain can occur, and name the mechanism that prevents it.
Answer: A two-node replica set (one primary, one async secondary) with manual failover and no fencing. If the primary's network is disrupted and the operator promotes the secondary, the old primary (still accepting writes from some clients) is now a second leader. Two divergent histories for the same keys.
Preventions:
- Quorum-based election with at least three nodes: the minority side cannot elect a new leader.
- Fencing tokens: storage accepts writes only with the current generation's token; old primary's writes are rejected even if it tries.
- STONITH: the failover controller physically powers off the old primary before promoting.
Question 13: Replication Log and Downstream Consumer
An analytics pipeline wants to consume every change made to a Postgres database and ship it to a Kafka topic. Which replication log format supports this, and what mechanism does Postgres provide?
Answer: Logical replication. Postgres exposes logical decoding via output plugins (pgoutput, wal2json). A tool like Debezium subscribes to a replication slot, reads decoded change events (INSERT / UPDATE / DELETE with row data), and publishes them to Kafka. Physical WAL is unsuitable because downstream consumers cannot interpret page-byte changes without the storage engine.
Question 14: Multi-Leader Conflict
Two leaders each accept a write to the same row within 10 ms of each other. One leader's clock is ahead of the other by 100 ms due to NTP skew. Under last-writer-wins, what happens, and why is this dangerous?
Answer: The write from the leader with the later wall-clock timestamp wins -- regardless of real-world order. If NTP skew causes the second-to-actually-write leader to stamp its write with an earlier timestamp, that write is silently discarded. Neither user sees an error; the data is simply wrong. This is why LWW with wall clocks is treated as unsafe. Hybrid Logical Clocks or vector clocks are the production answers.
Question 15: PostgreSQL / MongoDB / Cassandra Mapping
For each replication concept, say how PostgreSQL, MongoDB, and Cassandra handle it:
- Replication topology.
- Failover mechanism.
- CAP choice under a network partition.
Answer:
| Concept | PostgreSQL | MongoDB | Cassandra |
|---|---|---|---|
| Topology | Single-leader | Single-leader per replica set | Leaderless |
| Failover | Manual or Patroni/repmgr | Raft election (~12s default) | None -- any replica can coordinate |
| CAP under partition | CP (minority stops writes after failover) | CP (minority refuses writes) | AP (tunable; CL=QUORUM still accepts writes if 2-of-3 reachable) |
Interleaved Review Questions
Prior Module Question 1 (S6 Module 2: Storage Engines)
Why does a B-tree storage engine pair well with a WAL for durability, and what role does the WAL play in replication?
Answer: B-tree pages are updated in place, which is dangerous under crash (partial page writes corrupt data). A write-ahead log records intent before updates; on crash, the WAL is replayed to restore consistency. Because the WAL is a totally-ordered record of intended changes, it is also the natural source of a physical replication stream: replicas replay the same WAL to stay in sync byte-for-byte with the leader.
Prior Module Question 2 (S6 Module 1: SQL)
If a sharded application queries SELECT COUNT(*) FROM orders WHERE status='shipped' without a partition-key filter, what happens at execution time?
Answer: The query fans out to every shard (scatter). Each shard runs a local COUNT and returns it. The coordinator sums across responses (gather). Latency is dominated by the slowest shard (tail latency). The database "works" but performance scales inversely with shard count, which is the opposite of what people expect from sharding.
Prior Module Question 3 (S5 Module 4: Networks)
A database cluster assumes network round-trip time is bounded. Why is that assumption unsafe in a real data center, and what does "unbounded delay" mean operationally?
Answer: Switches queue under load, GC pauses freeze endpoints, NIC microbursts drop packets, and TCP retransmits absorb seconds. Tail latencies can be orders of magnitude larger than medians. Protocols that assume bounded delay can misbehave under these tail conditions (a common pattern in split-brain incidents). Safe protocols (Paxos, Raft) assume only eventual message delivery, not bounded delivery.
Prior Module Question 4 (S5 Module 3: OS -- Concurrency)
In a leaderless replication system, two clients concurrently write different values to the same key. How does the database decide which value is the "latest," and what is the failure mode?
Answer: Typically via a timestamp (wall clock or Lamport) or vector clocks. With wall clocks, NTP skew causes silent data loss (the "earlier-timestamped" write is discarded). With vector clocks, the system detects the concurrency and either stores both versions (exposing conflict to the application) or merges via a CRDT. The failure mode of naive LWW: correct-looking writes are silently thrown away.
Prior Module Question 5 (S5 Module 5 / S6 Module 2: Distributed Time)
Why is relying on synchronized clocks dangerous for distributed leases (e.g., "the leader holds the lock for 30 s")?
Answer: If one node's clock is fast, it may believe the lease expired early and elect a new leader while the old leader (with a slightly slow clock) still believes it holds the lock. Both act as leaders for a window -- split-brain. Real leasing systems (Google Chubby, etcd, ZooKeeper) tie lease expiry to heartbeats on the consensus layer rather than to local wall-clock time, specifically to avoid this.
Self-Assessment and Remediation
Mastery Level (90-100% correct):
- Ready to advance to Module 4 (Transactions and Consistency) with confidence.
Proficient Level (75-89% correct):
- Review missed concepts. Redo Practice 1 (topologies) or Practice 3 (anomalies clinic) depending on the gap.
Developing Level (60-74% correct):
- Redo Practice 2 (partitioning workshop) end to end. Most gaps cluster around "pick a key and defend it."
- Redo at least two Jepsen-style analyses (Practice 4 Kata 4).
Insufficient Level (<60% correct):
- Restart with Clusters 1-2. The most common failure mode is pattern-matching topology names without internalizing the write and read paths. Draw every topology from memory before re-reading.
- Spend a session on the CAP/PACELC scenario questions; these are the hinge of the module.