Replication Topologies Lab
Draw, walk through, and defend each of the three replication topologies against concrete failure scenarios. The output of this lab is a small notebook of hand-drawn diagrams with failure annotations, not running code.
Retrieval Prompts
- State from memory the write and read path for single-leader, multi-leader, and leaderless replication.
- Name the failure mode each topology most directly addresses and the failure mode each makes worse.
- Write the quorum inequality for leaderless replication and explain each symbol.
- State what "conflict" means in a multi-leader system and why single-leader cannot have one by construction.
- State how a client finds the current leader in each topology.
Compare and Distinguish
Produce a small matrix in your notebook with these rows and columns:
| Aspect | Single-leader | Multi-leader | Leaderless |
|---|---|---|---|
| Accepts writes on every node? | No | Yes | Yes |
| Conflicts possible? | No | Yes | Yes |
| Needs failover? | Yes | Sort of | No |
| Default consistency | Strong on leader | Eventual | Tunable |
| Canonical system | PostgreSQL | CouchDB | Cassandra |
Fill in at least three more rows (latency profile, conflict resolution mechanism, geographic distribution fit).
Drill 1: Single-Leader on Paper
Draw a single-leader cluster: one leader, three followers F1, F2, F3. Show the replication log flowing from leader to each follower.
Walk through these failures by annotating the diagram:
F1disconnects for 30 s, then returns.- Leader crashes;
F2is chosen as new leader via highest-LSN. - Leader is briefly overloaded (100% CPU) but not dead.
F3's disk fills and replication stops on that follower.
For each, write one sentence naming (a) what data is at risk, (b) who must intervene, (c) what changes for clients.
Drill 2: Multi-Leader Conflict Resolution
Draw an all-to-all multi-leader cluster with three leaders in us-east, eu-west, ap-south.
Scenario: user Alice edits her profile title at eu-west to "CEO"; user Bob edits the same profile title at us-east to "CTO" within 50 ms. Propagation takes 80 ms.
Answer:
- At what instants does each leader observe the conflict?
- What does LWW (last-writer-wins on wall clock) produce? What can go wrong?
- What does a vector-clock-based merge produce? Does the application have to participate?
- How would you route writes to avoid the conflict entirely ("home leader per user")?
Drill 3: Leaderless Quorum Walkthrough
A Cassandra-like cluster with N=5 replicas, W=3, R=3.
- A write arrives. Show the message fan-out. How many acknowledgments are needed?
- Replica-4 is temporarily down during the write. Where do writes destined for it go?
- After 5 minutes of downtime, Replica-4 returns. What happens to the hinted writes?
- Simulate
W=3, R=3: a read queries 3 replicas; 2 return the new value, 1 returns the old. Which value does the client see, and what happens to the stale replica? - Now change to
W=5, R=1. What does the availability picture look like when one replica is down?
Common Mistake Check
For each statement, explain why it is wrong:
- "Multi-leader is just single-leader that has been doubled, so it's easier to reason about."
- "
W + R > Nmeans strong consistency." - "Failover is a property of the topology."
- "Leaderless replication cannot lose writes because every node has a copy."
- "Ring topology is safe because messages always go the same direction."
Scenario: Pick the Topology
For each product, pick a topology and defend the pick in three sentences:
- An online multiplayer game storing session state across three global datacenters.
- A banking ledger with strict non-negative-balance invariants.
- A global IoT telemetry store accepting millions of writes/sec with relaxed read freshness.
- A collaboration tool like Figma where everyone shares one document that must sync within 100 ms.
- A multi-region e-commerce checkout where inventory decrements must be strongly consistent.
Evidence Check
This lab is complete only if:
- You have hand-drawn diagrams for all three topologies, with at least two failure annotations each.
- You have written out a multi-leader conflict resolution scenario with at least two resolution strategies.
- You have simulated a quorum read with a partially failed replica set.
- You can defend a topology choice for at least three of the five scenarios in the last section.