Learning Resources

This module is populated from the local chunked books in library/raw/semester-06-databases-distributed/books. Use this page as a source map, not as an instruction to read everything.

Source Stack

Book	Role	How to use it in this module
Designing Data-Intensive Applications (Kleppmann)	Primary teaching source	Default escalation for replication topologies, log formats, partitioning, and failover
Database Internals (Petrov)	Implementation-focused support	Use for consistency models, failure detection, leader election, and tunable consistency at the implementation level
Distributed Systems: Concepts and Design (Coulouris et al.)	Classical distributed-systems framing	Use for group communication, gossip protocols, and replicated-data fundamentals
Database System Concepts (Silberschatz et al.)	Relational-replication view	Use for textbook treatments of partitioning and replication in traditional RDBMS contexts

Resource Map by Cluster

Cluster 1: Why Replicate and Partition

Need	Best local chunk	Why
Replication overview and goals	DDIA: Chapter 5 Replication	Kleppmann's canonical opening on the three replication goals
Partitioning overview	DDIA: Chapter 6 Partitioning	Best narrative introduction to what partitioning buys and costs
CAP honest framing	DDIA: The cost of linearizability	Treats CAP as a runtime choice, not a static label
Partition-tolerance and majorities	DDIA: The truth is defined by the majority	Strong framing of quorum as the basis of consistency
Tunable consistency and PACELC	Database Internals: Tunable consistency	The most honest treatment of "CAP is a spectrum"
Partitioning in relational systems	Database System Concepts: Data partitioning	Textbook framing that complements DDIA

Cluster 2: Replication Topologies

Need	Best local chunk	Why
Single-leader setup and sync	DDIA: Synchronous versus asynchronous replication	Cleanest presentation of single-leader mechanics
New-follower bootstrap	DDIA: Setting up new followers	Operational flow for a green replica
Multi-leader motivations	DDIA: Use cases for multi-leader replication	When to reach for multi-leader
Multi-leader conflicts	DDIA: Handling write conflicts	The conflict-resolution menu
Multi-leader topologies	DDIA: Multi-leader replication topologies	All-to-all, ring, star tradeoffs
Leaderless writes under failure	DDIA: Writing to the database when a node is down	Quorum and coordinator mechanics
Quorum limitations	DDIA: Limitations of quorum consistency	Why `W+R>N` is necessary but not sufficient
Sloppy quorums and hinted handoff	DDIA: Sloppy quorums and hinted handoff	The availability extension and its cost
Detecting concurrent writes	DDIA: Detecting concurrent writes (part 1)	Version vectors and conflict surfacing
Gossip architecture	Distributed Systems: Gossip architecture (part 1)	Classical case study of eventually-consistent replication

Cluster 3: Replication Mechanics

Need	Best local chunk	Why
Replication log formats	DDIA: Implementation of replication logs	The authoritative side-by-side
CDC from logical logs	DDIA: Change data capture	Shows why logical replication is the CDC substrate
Replication-lag anomalies	DDIA: Problems with replication lag	Every anomaly named and explained
Monotonic-reads guarantee	DDIA: Monotonic reads	Cleanest explanation of the guarantee and its mechanism
Causality and session guarantees	DDIA: Ordering and causality (part 1)	Connects anomaly vocabulary to happens-before
Session consistency models	Database Internals: Session models	Concrete session guarantees in running systems

Cluster 4: Partitioning Strategies

Need	Best local chunk	Why
Range partitioning	DDIA: Partitioning by key range	When range beats hash
Hash partitioning	DDIA: Partitioning by hash of key	When hash beats range
Hotspot mitigation	DDIA: Skewed workloads and relieving hot spots	Application-level salting and why schemes alone cannot help
Local secondary indexes	DDIA: Partitioning secondary indexes by document	Canonical scatter-gather explanation
Global secondary indexes	DDIA: Partitioning secondary indexes by term	Canonical term-partitioned explanation
Rebalancing strategies	DDIA: Strategies for rebalancing	Fixed, dynamic, proportional schemes compared
Rebalancing automation	DDIA: Operations: automatic or manual rebalancing	The "should we do this automatically" question
Relational-style partitioning	Database System Concepts: Data partitioning	Textbook treatment of data partitioning
Skew handling (textbook)	Database System Concepts: Dealing with skew (part 1)	Classical skew-handling techniques
Database partitioning overview	Database Internals: Database partitioning	Implementation-level framing

Cluster 5: Practical Systems

Need	Best local chunk	Why
Coordination services	DDIA: Membership and coordination services	How ZooKeeper/etcd fit the cluster
Majority-based truth	DDIA: The truth is defined by the majority	The core of safe failover
Linearizable implementations	DDIA: Implementing linearizable systems	Why consensus-based coordinators exist
Process pauses and safety	DDIA: Process pauses (part 1)	Why clock-based leases are unsafe
Failure detection	Database Internals: Chapter 9 Failure detection	How "is the leader dead?" is actually answered
Phi-accrual failure detector	Database Internals: Phi-accrual failure detector	Cassandra's failure detection
Leader election	Database Internals: Chapter 10 Leader election	Protocol-level treatment
Database Internals replication chapter	Database Internals: Chapter 11 Replication and Consistency	Bridge into consistency models

External Resources (Validated)

These URLs were validated at the time of writing. Use them for primary-source material beyond the books.

Jepsen Analyses (Kyle Kingsbury)

Real-world correctness tests that expose replication and consistency violations under fault injection. The best training for "what goes wrong under a partition."

Jepsen: Analyses index -- start here.
Jepsen: MongoDB 4.2.6 -- even strongest read/write concerns failed to preserve snapshot isolation.
Jepsen: MongoDB 3.6.4 -- sharded-cluster safety analysis.
Jepsen: MongoDB 3.4.0-rc3 -- v0 replication protocol loses majority-committed writes.
Aphyr: Jepsen Cassandra original post -- foundational write-up of the Dynamo-style consistency model.

Martin Kleppmann Blog

Please stop calling databases CP or AP (2015) -- the essay behind this module's CAP framing.

Official System Documentation

PostgreSQL: High Availability, Load Balancing, and Replication -- the canonical PG replication reference.
PostgreSQL: Replication configuration parameters -- knobs for sync, lag, slot management.
PostgreSQL: Streaming Replication Protocol -- the wire protocol.
Cassandra: Dynamo architecture -- replication factor, consistent hashing, tokens.
Cassandra Basics -- replication factor and consistency level.
MongoDB: Replication -- replica sets overview.
MongoDB: Replica Set Elections -- Raft-based primary election.

Primary Literature

Dynamo: Amazon's Highly Available Key-value Store (SOSP 2007) -- the paper behind Cassandra, Riak, and most leaderless systems.

Exercise Support Chunks

Use these when the concept pages are understood but you need volume:

Use Rules

If a concept feels shaky, go to DDIA Chapter 5 or 6 for the corresponding section first; they are the module's spine.
If DDIA's narrative is clear but you want an operational picture, open Database Internals.
Open one chunk for one concept gap. Do not read chapter sequences by default.
Every primary concept in this module maps to at least one book chunk above. If you cannot find the mapping, re-read the concept page -- the gap is probably there, not in the source.
Use Jepsen reports only after you have internalized the module vocabulary. They are unforgiving of vague thinking.

Source Stack​

Resource Map by Cluster​

Cluster 1: Why Replicate and Partition​

Cluster 2: Replication Topologies​

Cluster 3: Replication Mechanics​

Cluster 4: Partitioning Strategies​

Cluster 5: Practical Systems​

External Resources (Validated)​

Jepsen Analyses (Kyle Kingsbury)​

Martin Kleppmann Blog​

Official System Documentation​

Primary Literature​

Exercise Support Chunks​

Use Rules​