Module 3: Replication & Partitioning: Case Studies
These case studies make distributed data design concrete: replica lag, read routing, synchronous commit, shard keys, hot partitions, global indexes, quorum behavior, and repair. The point is not to memorize product features. The point is to explain what guarantee the system gives, what it costs, and what breaks when the guarantee is assumed but not designed.
How To Use These Case Studies
- Draw the data path before reading the better approach.
- Name the guarantee: availability, durability, freshness, monotonicity, locality, or throughput.
- Identify the partition key or replica set involved.
- Produce the required artifact.
- Connect the decision to a capstone read/write path.
Case Study 1: Read Replica That Violates Read-Your-Writes
Scenario: A user updates their billing address, gets a success response, then refreshes the account page. The API routes reads to a PostgreSQL hot standby. The old address appears for a few seconds, so support receives "my change disappeared" tickets.
Source anchor: PostgreSQL hot standby documentation says standby data can lag the primary, and nearly simultaneous queries on primary and standby can return different results. PostgreSQL streaming replication is asynchronous by default, so commits can become visible on the standby after a delay. See PostgreSQL Hot Standby and PostgreSQL Log-Shipping Standby Servers.
Module concepts:
- single-leader replication
- asynchronous replication
- replica lag
- read-your-writes
- request routing
- session guarantees
Wrong Approach
"All reads should go to replicas because replicas scale reads."
That treats every read as equally stale-tolerant. A dashboard summary might tolerate seconds of lag. A read immediately after the user's own write usually cannot.
Better Approach
Classify reads by freshness requirement:
Freshness-sensitive:
read after write
checkout confirmation
permission changes
account/security settings
Stale-tolerant:
analytics dashboards
public catalog pages
recommendation widgets
Then route freshness-sensitive reads to the primary for a short window, or use a session token that records the write position and only serves from a replica that has replayed at least that position.
Tradeoff Table
| Choice | Gain | Cost |
|---|---|---|
| all reads to replicas | lower primary read load | stale reads break user expectations |
| primary read after write | read-your-writes for critical flows | higher primary load |
| sticky session window | targeted freshness | more routing state |
| replica replay-position token | precise correctness gate | requires exposing and checking replication progress |
Failure Mode
The system reports successful writes but the UI contradicts the response. Users retry, creating duplicate updates and support noise.
Required Artifact
Draw a timeline:
T0 user writes billing address
T1 primary commits
T2 API returns success
T3 replica has not replayed commit
T4 user refreshes
T5 corrected routing decision
Add the routing rule you would implement.
Project / Capstone Connection
Any capstone with read replicas must label stale-tolerant and freshness-sensitive reads. "Use replicas" is not a complete architecture decision.
Case Study 2: Synchronous Replication Chosen For The Wrong Reason
Scenario: A payments service wants lower chance of data loss during primary failure. The team enables synchronous replication and expects it to also make replica reads immediately fresh everywhere.
Source anchor: PostgreSQL documents that streaming replication is asynchronous by default and synchronous replication can require commit to wait for standby confirmation. The same documentation also distinguishes commit durability from standby query behavior and hot-standby conflicts. See PostgreSQL synchronous replication and PostgreSQL Hot Standby conflicts.
Module concepts:
- synchronous replication
- commit acknowledgement
- durability window
- availability tradeoff
- read freshness
- failover semantics
Wrong Approach
"Synchronous replication means every replica is always current and all reads can go anywhere."
Synchronous commit controls what the primary waits for before acknowledging a commit. It does not automatically mean every standby has applied the transaction and can serve every read with the latest value.
Better Approach
Separate three questions:
Durability:
Can the acknowledged transaction survive primary failure?
Visibility:
Has a chosen standby replayed the commit yet?
Availability:
What happens to writes when the synchronous standby is slow or unavailable?
For payments, synchronous replication may be justified for the write path, but freshness-sensitive reads still need a routing rule. The team also needs a policy for standby failure: block writes, degrade to async, or fail over.
Tradeoff Table
| Choice | Gain | Cost |
|---|---|---|
| asynchronous replication | low write latency and high write availability | possible data loss on primary failure |
| synchronous replication | stronger durability after acknowledgement | write latency depends on standby/network |
| wait for remote apply | better read-after-write on standby | more commit latency |
| degrade to async during incident | preserves availability | weakens durability expectation |
Failure Mode
The team improves durability but still serves stale reads from a lagging standby. Later, during a network issue, writes stall because no one defined the synchronous-standby failure policy.
Required Artifact
Write a replication-mode ADR:
Data class:
Acknowledgement rule:
Standby failure behavior:
Fresh-read routing:
Maximum accepted data-loss window:
Maximum accepted write-latency budget:
Metrics:
Rollback plan:
Project / Capstone Connection
If a capstone handles payments, orders, credentials, or permissions, its replication design must state what "committed" means after failover.
Case Study 3: Monotonic Timestamp Shard Key Creates One Hot Chunk
Scenario: A MongoDB event collection is sharded by created_at because most queries are time-range queries. During peak ingestion, one shard handles almost all writes while other shards are quiet.
Source anchor: MongoDB ranged sharding groups nearby shard-key values into ranges, which can target range queries efficiently. MongoDB also warns that poor shard-key selection can cause uneven distribution and bottlenecks, and that ranged sharding works best with high cardinality, low frequency, and non-monotonically changing keys. Hashed sharding distributes monotonically changing keys more evenly but makes range queries less targeted. See MongoDB Sharding, MongoDB Ranged Sharding, and MongoDB Data Partitioning with Chunks.
Module concepts:
- range partitioning
- hash partitioning
- shard-key cardinality
- hot shards
- targeted vs broadcast queries
- rebalancing
Wrong Approach
"Use the field most queries filter by as the shard key."
That is only half the problem. The shard key must support query routing and distribute write load. A monotonically increasing key can place all new writes into the upper range.
Better Approach
Compare access patterns before choosing:
Write path:
append-only events, high sustained ingest
Read path:
tenant + time range
recent events
occasional global analytics
Candidate keys:
created_at
tenant_id
tenant_id + created_at
hashed event_id
hashed tenant bucket + created_at
A better design often uses a compound key or bucketing strategy that spreads writes while keeping common tenant/time queries targeted enough.
Tradeoff Table
| Choice | Gain | Cost |
|---|---|---|
range on created_at | efficient global time ranges | newest range becomes hot |
| hashed timestamp/id | better write spread | range queries become scatter/broadcast |
tenant_id, created_at | tenant time queries route well | large tenants can still become hot |
| salted tenant bucket | spreads dominant tenants | reads must query multiple buckets |
Failure Mode
The cluster has many shards but one chunk receives the current write range. Scaling hardware does not help because the partition key funnels traffic.
Required Artifact
Create a shard-key memo:
Top 5 query patterns:
Top 3 write patterns:
Candidate shard key:
Expected write distribution:
Expected read routing:
Hotspot risk:
Rebalancing/resharding plan:
Rejected alternatives:
Project / Capstone Connection
Any event, log, or notification capstone needs a partition-key note that explicitly handles "latest item" hotspots.
Case Study 4: DynamoDB Hot Partition Despite Enough Table Capacity
Scenario: A leaderboard writes score updates under partition key game_id. One popular game drives nearly all traffic. The table has enough total provisioned capacity, but requests for that one key are throttled.
Source anchor: DynamoDB partition-key guidance says applications should distribute activity uniformly across partition keys. DynamoDB partition-level limits can throttle a hot partition even when the table has unused capacity elsewhere. AWS recommends write sharding by adding a random or calculated suffix when a key receives concentrated writes. See DynamoDB partition key design, DynamoDB hot partition mitigation, and DynamoDB write sharding.
Module concepts:
- partition-key design
- hot keys
- adaptive capacity limits
- write sharding
- fan-out reads
- aggregate materialization
Wrong Approach
"Increase table capacity; the table is throttling."
The table may have enough total capacity while one partition key exceeds its partition-level limit. More global capacity does not automatically fix a concentrated key.
Better Approach
Spread the hot write path:
Base key:
game_id = world-cup-final
Sharded write key:
game_id#bucket = world-cup-final#00..world-cup-final#49
Read path:
query N buckets
merge top scores
optionally maintain a materialized top-N view
Use random suffixes for simple write spread, or calculated suffixes when reads need predictable bucket selection.
Tradeoff Table
| Choice | Gain | Cost |
|---|---|---|
single game_id key | simple query | hot partition under popular games |
| random write shard | spreads writes | reads fan out across buckets |
| calculated bucket | predictable read subset | bucket function must match access pattern |
| materialized top-N view | fast leaderboard reads | async update and correctness policy |
Failure Mode
The system scales for many normal games but fails exactly during the launch, match, sale, or campaign that matters most.
Required Artifact
Build a partition-load model:
Peak writes/sec for hottest key:
Item size:
Buckets needed:
Read fan-out:
Merge strategy:
Staleness accepted for aggregate view:
Alarm threshold:
Project / Capstone Connection
If the capstone uses DynamoDB-style partition keys, include one "celebrity tenant" or "popular item" load test.
Case Study 5: Global Secondary Index That Is Not A Strong Replica
Scenario: An orders table uses order_id as the primary key. Customer support needs to search by customer_email, so the team adds a DynamoDB global secondary index. After order creation, support sometimes cannot find the order immediately by email.
Source anchor: DynamoDB documentation says global secondary indexes are maintained asynchronously and are eventually consistent. It also says strongly consistent reads are supported on tables and local secondary indexes, but not on global secondary indexes. See DynamoDB read consistency, DynamoDB global secondary indexes, and DynamoDB secondary indexes.
Module concepts:
- global secondary index
- asynchronous index propagation
- eventual consistency
- alternate access path
- user-facing freshness
Wrong Approach
"A global secondary index is just another strongly consistent lookup path."
The GSI is a replicated, separately partitioned access path. It can lag behind the base table.
Better Approach
Design the support workflow around index propagation:
Order creation response:
returns order_id
Immediate lookup:
use order_id against base table for strong read if needed
Email search:
query GSI, tolerate short delay
Fallback:
if new order not visible by email, show "still indexing" state or use known order_id
Tradeoff Table
| Choice | Gain | Cost |
|---|---|---|
| GSI by email | efficient alternate query | eventual consistency only |
| duplicate lookup table | explicit workflow control | dual-write or stream processing complexity |
| base-table lookup by order ID | strong read available | caller must know order ID |
| scan table by email | no new index | slow and expensive |
Failure Mode
Support treats missing GSI results as missing orders and creates duplicates or escalates false incidents.
Required Artifact
Write an index consistency contract:
Index name:
Source table:
Access pattern:
Expected propagation delay:
Freshness-sensitive fallback:
User-facing copy/state:
Duplicate prevention rule:
Project / Capstone Connection
Any capstone with alternate lookup paths must distinguish primary-key reads from eventually consistent index reads.
Case Study 6: Quorum Reads That Still Need Repair Discipline
Scenario: A Cassandra cluster uses replication factor 3 and LOCAL_QUORUM for important reads and writes. One replica is down for several hours. The team assumes quorum settings alone permanently repair every missed write.
Source anchor: Cassandra's architecture docs describe a partitioned, replicated, eventually consistent database with tunable consistency. Cassandra read repair can repair replicas during reads, but repair operations are still required because hints are best effort and not guaranteed to cover all missed writes. Cassandra repair compares token ranges and streams differences. See Cassandra architecture overview, Cassandra read repair, Cassandra hints, and Cassandra repair.
Module concepts:
- replication factor
- tunable consistency
- quorum reads/writes
- hinted handoff
- read repair
- anti-entropy repair
Wrong Approach
"If we use quorum, repair is optional."
Quorum can provide useful read/write behavior for acknowledged operations, but it is not a full maintenance strategy. Missed writes, node outages, tombstones, and unrepaired ranges still need operational repair discipline.
Better Approach
Treat consistency as a runtime and operations contract:
Request-time:
choose read/write consistency level
understand what replicas acknowledge
Failure recovery:
hints help temporarily unavailable replicas
read repair fixes inconsistencies touched by reads
scheduled repair covers token ranges systematically
The system needs repair schedules, outage windows, and metrics for inconsistent ranges, not only a consistency level in application code.
Tradeoff Table
| Choice | Gain | Cost |
|---|---|---|
ONE reads/writes | low latency and high availability | stale reads more likely |
LOCAL_QUORUM | stronger local-datacenter behavior | more replicas must be reachable |
| blocking read repair | monotonic quorum reads | read latency can include repair work |
| scheduled repair | systematic convergence | disk/network I/O and operational planning |
Failure Mode
A node returns after an outage, missed data is only partially repaired, and old values or deleted data reappear later because repair did not run within the required window.
Required Artifact
Create a consistency-and-repair runbook:
Keyspace/table:
Replication factor:
Read consistency:
Write consistency:
Node outage assumption:
Hint window:
Read repair behavior:
Repair schedule:
Metrics:
Incident trigger:
Project / Capstone Connection
If a capstone uses a quorum-style store, the architecture review must include an operations story. Consistency is not only a client setting.
Source Map
| Source | Use it for |
|---|---|
| PostgreSQL Hot Standby | standby reads, replication delay, eventually consistent standby data, query conflicts |
| PostgreSQL Log-Shipping Standby Servers | streaming replication, asynchronous default, synchronous replication |
| MongoDB Sharding | hashed vs ranged sharding and targeted vs broadcast operations |
| MongoDB Ranged Sharding | shard-key traits and range-partition behavior |
| MongoDB Data Partitioning with Chunks | chunks, balancer behavior, jumbo chunk risk |
| DynamoDB partition key design | uniform partition-key activity and partition throughput limits |
| DynamoDB hot partition mitigation | partition-level throttling and hot partitions |
| DynamoDB write sharding | random/calculated suffix write-sharding patterns |
| DynamoDB read consistency | strong vs eventual reads and GSI consistency limits |
| DynamoDB global secondary indexes | asynchronous GSI propagation and capacity tradeoffs |
| Cassandra architecture overview | partitioned replicated storage model and eventual consistency |
| Cassandra read repair | monotonic quorum reads and blocking repair behavior |
| Cassandra hints | hinted handoff role and configuration |
| Cassandra repair | anti-entropy repair, token ranges, repair cadence |
Completion Standard
- At least three case-study artifacts are completed.
- At least one artifact includes a replica-lag or stale-read timeline.
- At least one artifact includes a shard-key or partition-key decision memo.
- At least one artifact includes an operations runbook for repair, failover, or lag.
- At least one case connects replication behavior to Module 4 transactions/consistency.
- At least one case connects partitioning behavior to Module 5 distributed-systems fundamentals.