Skip to main content

Module 3: Replication & Partitioning: Case Studies

These case studies make distributed data design concrete: replica lag, read routing, synchronous commit, shard keys, hot partitions, global indexes, quorum behavior, and repair. The point is not to memorize product features. The point is to explain what guarantee the system gives, what it costs, and what breaks when the guarantee is assumed but not designed.


How To Use These Case Studies

  1. Draw the data path before reading the better approach.
  2. Name the guarantee: availability, durability, freshness, monotonicity, locality, or throughput.
  3. Identify the partition key or replica set involved.
  4. Produce the required artifact.
  5. Connect the decision to a capstone read/write path.

Case Study 1: Read Replica That Violates Read-Your-Writes

Scenario: A user updates their billing address, gets a success response, then refreshes the account page. The API routes reads to a PostgreSQL hot standby. The old address appears for a few seconds, so support receives "my change disappeared" tickets.

Source anchor: PostgreSQL hot standby documentation says standby data can lag the primary, and nearly simultaneous queries on primary and standby can return different results. PostgreSQL streaming replication is asynchronous by default, so commits can become visible on the standby after a delay. See PostgreSQL Hot Standby and PostgreSQL Log-Shipping Standby Servers.

Module concepts:

  • single-leader replication
  • asynchronous replication
  • replica lag
  • read-your-writes
  • request routing
  • session guarantees

Wrong Approach

"All reads should go to replicas because replicas scale reads."

That treats every read as equally stale-tolerant. A dashboard summary might tolerate seconds of lag. A read immediately after the user's own write usually cannot.

Better Approach

Classify reads by freshness requirement:

Freshness-sensitive:
read after write
checkout confirmation
permission changes
account/security settings

Stale-tolerant:
analytics dashboards
public catalog pages
recommendation widgets

Then route freshness-sensitive reads to the primary for a short window, or use a session token that records the write position and only serves from a replica that has replayed at least that position.

Tradeoff Table

ChoiceGainCost
all reads to replicaslower primary read loadstale reads break user expectations
primary read after writeread-your-writes for critical flowshigher primary load
sticky session windowtargeted freshnessmore routing state
replica replay-position tokenprecise correctness gaterequires exposing and checking replication progress

Failure Mode

The system reports successful writes but the UI contradicts the response. Users retry, creating duplicate updates and support noise.

Required Artifact

Draw a timeline:

T0 user writes billing address
T1 primary commits
T2 API returns success
T3 replica has not replayed commit
T4 user refreshes
T5 corrected routing decision

Add the routing rule you would implement.

Project / Capstone Connection

Any capstone with read replicas must label stale-tolerant and freshness-sensitive reads. "Use replicas" is not a complete architecture decision.


Case Study 2: Synchronous Replication Chosen For The Wrong Reason

Scenario: A payments service wants lower chance of data loss during primary failure. The team enables synchronous replication and expects it to also make replica reads immediately fresh everywhere.

Source anchor: PostgreSQL documents that streaming replication is asynchronous by default and synchronous replication can require commit to wait for standby confirmation. The same documentation also distinguishes commit durability from standby query behavior and hot-standby conflicts. See PostgreSQL synchronous replication and PostgreSQL Hot Standby conflicts.

Module concepts:

  • synchronous replication
  • commit acknowledgement
  • durability window
  • availability tradeoff
  • read freshness
  • failover semantics

Wrong Approach

"Synchronous replication means every replica is always current and all reads can go anywhere."

Synchronous commit controls what the primary waits for before acknowledging a commit. It does not automatically mean every standby has applied the transaction and can serve every read with the latest value.

Better Approach

Separate three questions:

Durability:
Can the acknowledged transaction survive primary failure?

Visibility:
Has a chosen standby replayed the commit yet?

Availability:
What happens to writes when the synchronous standby is slow or unavailable?

For payments, synchronous replication may be justified for the write path, but freshness-sensitive reads still need a routing rule. The team also needs a policy for standby failure: block writes, degrade to async, or fail over.

Tradeoff Table

ChoiceGainCost
asynchronous replicationlow write latency and high write availabilitypossible data loss on primary failure
synchronous replicationstronger durability after acknowledgementwrite latency depends on standby/network
wait for remote applybetter read-after-write on standbymore commit latency
degrade to async during incidentpreserves availabilityweakens durability expectation

Failure Mode

The team improves durability but still serves stale reads from a lagging standby. Later, during a network issue, writes stall because no one defined the synchronous-standby failure policy.

Required Artifact

Write a replication-mode ADR:

Data class:
Acknowledgement rule:
Standby failure behavior:
Fresh-read routing:
Maximum accepted data-loss window:
Maximum accepted write-latency budget:
Metrics:
Rollback plan:

Project / Capstone Connection

If a capstone handles payments, orders, credentials, or permissions, its replication design must state what "committed" means after failover.


Case Study 3: Monotonic Timestamp Shard Key Creates One Hot Chunk

Scenario: A MongoDB event collection is sharded by created_at because most queries are time-range queries. During peak ingestion, one shard handles almost all writes while other shards are quiet.

Source anchor: MongoDB ranged sharding groups nearby shard-key values into ranges, which can target range queries efficiently. MongoDB also warns that poor shard-key selection can cause uneven distribution and bottlenecks, and that ranged sharding works best with high cardinality, low frequency, and non-monotonically changing keys. Hashed sharding distributes monotonically changing keys more evenly but makes range queries less targeted. See MongoDB Sharding, MongoDB Ranged Sharding, and MongoDB Data Partitioning with Chunks.

Module concepts:

  • range partitioning
  • hash partitioning
  • shard-key cardinality
  • hot shards
  • targeted vs broadcast queries
  • rebalancing

Wrong Approach

"Use the field most queries filter by as the shard key."

That is only half the problem. The shard key must support query routing and distribute write load. A monotonically increasing key can place all new writes into the upper range.

Better Approach

Compare access patterns before choosing:

Write path:
append-only events, high sustained ingest

Read path:
tenant + time range
recent events
occasional global analytics

Candidate keys:
created_at
tenant_id
tenant_id + created_at
hashed event_id
hashed tenant bucket + created_at

A better design often uses a compound key or bucketing strategy that spreads writes while keeping common tenant/time queries targeted enough.

Tradeoff Table

ChoiceGainCost
range on created_atefficient global time rangesnewest range becomes hot
hashed timestamp/idbetter write spreadrange queries become scatter/broadcast
tenant_id, created_attenant time queries route welllarge tenants can still become hot
salted tenant bucketspreads dominant tenantsreads must query multiple buckets

Failure Mode

The cluster has many shards but one chunk receives the current write range. Scaling hardware does not help because the partition key funnels traffic.

Required Artifact

Create a shard-key memo:

Top 5 query patterns:
Top 3 write patterns:
Candidate shard key:
Expected write distribution:
Expected read routing:
Hotspot risk:
Rebalancing/resharding plan:
Rejected alternatives:

Project / Capstone Connection

Any event, log, or notification capstone needs a partition-key note that explicitly handles "latest item" hotspots.


Case Study 4: DynamoDB Hot Partition Despite Enough Table Capacity

Scenario: A leaderboard writes score updates under partition key game_id. One popular game drives nearly all traffic. The table has enough total provisioned capacity, but requests for that one key are throttled.

Source anchor: DynamoDB partition-key guidance says applications should distribute activity uniformly across partition keys. DynamoDB partition-level limits can throttle a hot partition even when the table has unused capacity elsewhere. AWS recommends write sharding by adding a random or calculated suffix when a key receives concentrated writes. See DynamoDB partition key design, DynamoDB hot partition mitigation, and DynamoDB write sharding.

Module concepts:

  • partition-key design
  • hot keys
  • adaptive capacity limits
  • write sharding
  • fan-out reads
  • aggregate materialization

Wrong Approach

"Increase table capacity; the table is throttling."

The table may have enough total capacity while one partition key exceeds its partition-level limit. More global capacity does not automatically fix a concentrated key.

Better Approach

Spread the hot write path:

Base key:
game_id = world-cup-final

Sharded write key:
game_id#bucket = world-cup-final#00..world-cup-final#49

Read path:
query N buckets
merge top scores
optionally maintain a materialized top-N view

Use random suffixes for simple write spread, or calculated suffixes when reads need predictable bucket selection.

Tradeoff Table

ChoiceGainCost
single game_id keysimple queryhot partition under popular games
random write shardspreads writesreads fan out across buckets
calculated bucketpredictable read subsetbucket function must match access pattern
materialized top-N viewfast leaderboard readsasync update and correctness policy

Failure Mode

The system scales for many normal games but fails exactly during the launch, match, sale, or campaign that matters most.

Required Artifact

Build a partition-load model:

Peak writes/sec for hottest key:
Item size:
Buckets needed:
Read fan-out:
Merge strategy:
Staleness accepted for aggregate view:
Alarm threshold:

Project / Capstone Connection

If the capstone uses DynamoDB-style partition keys, include one "celebrity tenant" or "popular item" load test.


Case Study 5: Global Secondary Index That Is Not A Strong Replica

Scenario: An orders table uses order_id as the primary key. Customer support needs to search by customer_email, so the team adds a DynamoDB global secondary index. After order creation, support sometimes cannot find the order immediately by email.

Source anchor: DynamoDB documentation says global secondary indexes are maintained asynchronously and are eventually consistent. It also says strongly consistent reads are supported on tables and local secondary indexes, but not on global secondary indexes. See DynamoDB read consistency, DynamoDB global secondary indexes, and DynamoDB secondary indexes.

Module concepts:

  • global secondary index
  • asynchronous index propagation
  • eventual consistency
  • alternate access path
  • user-facing freshness

Wrong Approach

"A global secondary index is just another strongly consistent lookup path."

The GSI is a replicated, separately partitioned access path. It can lag behind the base table.

Better Approach

Design the support workflow around index propagation:

Order creation response:
returns order_id

Immediate lookup:
use order_id against base table for strong read if needed

Email search:
query GSI, tolerate short delay

Fallback:
if new order not visible by email, show "still indexing" state or use known order_id

Tradeoff Table

ChoiceGainCost
GSI by emailefficient alternate queryeventual consistency only
duplicate lookup tableexplicit workflow controldual-write or stream processing complexity
base-table lookup by order IDstrong read availablecaller must know order ID
scan table by emailno new indexslow and expensive

Failure Mode

Support treats missing GSI results as missing orders and creates duplicates or escalates false incidents.

Required Artifact

Write an index consistency contract:

Index name:
Source table:
Access pattern:
Expected propagation delay:
Freshness-sensitive fallback:
User-facing copy/state:
Duplicate prevention rule:

Project / Capstone Connection

Any capstone with alternate lookup paths must distinguish primary-key reads from eventually consistent index reads.


Case Study 6: Quorum Reads That Still Need Repair Discipline

Scenario: A Cassandra cluster uses replication factor 3 and LOCAL_QUORUM for important reads and writes. One replica is down for several hours. The team assumes quorum settings alone permanently repair every missed write.

Source anchor: Cassandra's architecture docs describe a partitioned, replicated, eventually consistent database with tunable consistency. Cassandra read repair can repair replicas during reads, but repair operations are still required because hints are best effort and not guaranteed to cover all missed writes. Cassandra repair compares token ranges and streams differences. See Cassandra architecture overview, Cassandra read repair, Cassandra hints, and Cassandra repair.

Module concepts:

  • replication factor
  • tunable consistency
  • quorum reads/writes
  • hinted handoff
  • read repair
  • anti-entropy repair

Wrong Approach

"If we use quorum, repair is optional."

Quorum can provide useful read/write behavior for acknowledged operations, but it is not a full maintenance strategy. Missed writes, node outages, tombstones, and unrepaired ranges still need operational repair discipline.

Better Approach

Treat consistency as a runtime and operations contract:

Request-time:
choose read/write consistency level
understand what replicas acknowledge

Failure recovery:
hints help temporarily unavailable replicas
read repair fixes inconsistencies touched by reads
scheduled repair covers token ranges systematically

The system needs repair schedules, outage windows, and metrics for inconsistent ranges, not only a consistency level in application code.

Tradeoff Table

ChoiceGainCost
ONE reads/writeslow latency and high availabilitystale reads more likely
LOCAL_QUORUMstronger local-datacenter behaviormore replicas must be reachable
blocking read repairmonotonic quorum readsread latency can include repair work
scheduled repairsystematic convergencedisk/network I/O and operational planning

Failure Mode

A node returns after an outage, missed data is only partially repaired, and old values or deleted data reappear later because repair did not run within the required window.

Required Artifact

Create a consistency-and-repair runbook:

Keyspace/table:
Replication factor:
Read consistency:
Write consistency:
Node outage assumption:
Hint window:
Read repair behavior:
Repair schedule:
Metrics:
Incident trigger:

Project / Capstone Connection

If a capstone uses a quorum-style store, the architecture review must include an operations story. Consistency is not only a client setting.


Source Map

SourceUse it for
PostgreSQL Hot Standbystandby reads, replication delay, eventually consistent standby data, query conflicts
PostgreSQL Log-Shipping Standby Serversstreaming replication, asynchronous default, synchronous replication
MongoDB Shardinghashed vs ranged sharding and targeted vs broadcast operations
MongoDB Ranged Shardingshard-key traits and range-partition behavior
MongoDB Data Partitioning with Chunkschunks, balancer behavior, jumbo chunk risk
DynamoDB partition key designuniform partition-key activity and partition throughput limits
DynamoDB hot partition mitigationpartition-level throttling and hot partitions
DynamoDB write shardingrandom/calculated suffix write-sharding patterns
DynamoDB read consistencystrong vs eventual reads and GSI consistency limits
DynamoDB global secondary indexesasynchronous GSI propagation and capacity tradeoffs
Cassandra architecture overviewpartitioned replicated storage model and eventual consistency
Cassandra read repairmonotonic quorum reads and blocking repair behavior
Cassandra hintshinted handoff role and configuration
Cassandra repairanti-entropy repair, token ranges, repair cadence

Completion Standard

  • At least three case-study artifacts are completed.
  • At least one artifact includes a replica-lag or stale-read timeline.
  • At least one artifact includes a shard-key or partition-key decision memo.
  • At least one artifact includes an operations runbook for repair, failover, or lag.
  • At least one case connects replication behavior to Module 4 transactions/consistency.
  • At least one case connects partitioning behavior to Module 5 distributed-systems fundamentals.