Learning Resources

This module is populated from the local chunked books in library/raw/semester-06-databases-distributed/books. Use this page as a source map, not as an instruction to read everything.

Source Stack

Book	Role	How to use it in this module
Designing Data-Intensive Applications (Kleppmann)	Primary teaching source	Default escalation for every primary concept. Chapters 8 (The Trouble with Distributed Systems) and 9 (Consistency and Consensus) are the core chunks for this module
Distributed Systems Concepts and Design (Coulouris et al.)	Canonical theory source	Formal treatment of time, clocks, failure models, and consensus. Go here for the rigorous textbook view
Database Internals (Petrov)	Implementation-level support	Concrete views of failure detection, gossip, ZAB, Paxos, Multi-Paxos, and Raft; most efficient for "how it is actually built"
Database System Concepts (Silberschatz et al.)	Peripheral	Limited direct coverage of these topics; used lightly

Resource Map by Cluster

Cluster 1: The Inescapable Reality

Need	Best local chunk	Why
Fallacies (origin and modern restatement)	Database Internals: Fallacies of Distributed Computing	Cleanest single-chunk list and commentary
Partial failure intuition	DDIA: Faults and Partial Failures	The single best page on "a distributed system has partial failure"
Networks are unreliable	DDIA: Unreliable Networks	Concrete patterns for how networks really fail
Cloud vs HPC failure culture	DDIA: Cloud Computing and Supercomputing	Where partial failure is a design choice
Timeouts and unbounded delay	DDIA: Timeouts and Unbounded Delays	Why no timeout distinguishes slow from dead
Async vs sync networks	DDIA: Synchronous Versus Asynchronous Networks	Formal contrast; partial synchrony motivation
Process pauses	DDIA: Process Pauses (Part 1), Part 2	GC pauses as a distributed failure mode
Two generals	Database Internals: Two Generals' Problem	Impossibility motivator
System synchrony formalism	Database Internals: System Synchrony	Async vs partial sync vs sync formalized
Coulouris challenges	Coulouris: Challenges (Part 1), Part 2, Part 3	Textbook framing of heterogeneity, failure, scaling

Cluster 2: Time, Clocks, and Ordering

Need	Best local chunk	Why
Why clocks are unreliable	DDIA: Unreliable Clocks	The single best page on this topic
NTP accuracy in practice	DDIA: Clock Synchronization and Accuracy	Concrete numbers and failure modes
Wall-clock pitfalls	DDIA: Relying on Synchronized Clocks (Part 1), Part 2	LWW traps and Spanner's TrueTime
Implementation-level time	Database Internals: Clocks and Time	Monotonic vs wall-clock at the system level
Physical clock synchronization	Coulouris: Synchronizing Physical Clocks (Part 1)	Cristian's algorithm, Berkeley, NTP
Logical time (textbook)	Coulouris: Logical Time and Logical Clocks	Lamport and vector clocks rigorously
Happens-before and causality	DDIA: Ordering and Causality (Part 1), Part 2	Happens-before to causal consistency
Sequence number ordering	DDIA: Sequence Number Ordering (Part 1)	From Lamport clocks to leader-issued sequence numbers
Ordering (implementation)	Database Internals: Ordering	Compact implementation-level summary

Cluster 3: Failure Detection and Membership

Need	Best local chunk	Why
Failure detection overview	Database Internals: Chapter 9 - Failure Detection	Best introduction to the problem
Phi-accrual	Database Internals: Phi-Accrual Failure Detector	The core technique Cassandra and others use
Failure detection summary	Database Internals: Summary (Chapter 9)	Compact wrap-up of detection protocols
Gossip dissemination	Database Internals: Gossip Dissemination	Logarithmic dissemination, SWIM-adjacent design
Hybrid gossip	Database Internals: Hybrid Gossip	Performance tuning real gossip
Anti-entropy primer	Database Internals: Chapter 12 - Anti-entropy and Dissemination	Context for gossip
Gossip (textbook)	Coulouris: Gossip architecture (Part 1)	Bayou-style gossip analysis
Omission faults	Database Internals: Omission Faults	The failure model under gossip/heartbeat
Byzantine (DDIA)	DDIA: Byzantine Faults	When and why BFT applies
System model and reality	DDIA: System Model and Reality	Connecting textbook models to real hardware
PBFT algorithm	Database Internals: PBFT Algorithm	Classical BFT reference

Cluster 4: Consensus

Need	Best local chunk	Why
Why consensus is needed	DDIA: Distributed Transactions and Consensus	The problem statement across multiple use cases
Fault-tolerant consensus	DDIA: Fault-Tolerant Consensus (Part 1), Part 2, Part 3	Kleppmann's modern presentation of the consensus problem
Consensus chapter	Database Internals: Chapter 14 - Consensus	Implementation-level entry point
Paxos	Database Internals: Paxos	Clear basic Paxos
Paxos quorums	Database Internals: Quorums in Paxos	Quorum intersection explained
Multi-Paxos	Database Internals: Multi-Paxos	From single-value to replicated log
Egalitarian Paxos	Database Internals: Egalitarian Paxos	Where the leaderless variant fits
Raft	Database Internals: Raft	Core Raft exposition
Raft leader role	Database Internals: Leader Role in Raft	Operational model of the leader
ZAB	Database Internals: ZAB	ZooKeeper's atomic broadcast protocol
Consensus (textbook)	Coulouris: Consensus and related problems (Part 1), Part 2, Part 3	Formal problem definition, FLP, and Byzantine generals

Cluster 5: Distributed System Patterns

Need	Best local chunk	Why
Leader election introduction	Database Internals: Chapter 10 - Leader Election	Compact introduction to the problem
Bully algorithm	Database Internals: Bully Algorithm	Classical leader-election reference
Majority as truth	DDIA: The Truth Is Defined by the Majority	Why single-leader designs must check quorums
Elections (Coulouris)	Coulouris: Elections (Part 1)	Formal election algorithm treatment
Idempotency context	DDIA: Summary (Chapter 8 Part 2)	Wrap-up of the retry-and-duplicate story
End-to-end argument	DDIA: The End-to-End Argument for Databases (Part 1), Part 2	Why exactly-once must live at the application layer
Coordination avoidance	Database Internals: Coordination Avoidance	When you can skip the coordination dance entirely
Coordination services (DDIA)	DDIA: Membership and Coordination Services	What ZooKeeper/etcd/Consul actually provide
Coordination services (Coulouris)	Coulouris: Data storage and coordination services (Part 1), Part 2	Google Chubby-style treatment

Exercise Support Chunks

Use these when concept pages are understood but fluency is weak:

External Resources (Validated, Read If Pointed Here)

The module links to specific external posts from concept pages. All validated as of the most recent curation pass.

Lamport: "Time, Clocks, and the Ordering of Events in a Distributed System" (1978) - the foundational paper for logical time. Still the single clearest text on the happens-before relation.
The Raft Consensus Algorithm (raft.github.io) - the canonical index of Raft resources, including the paper, visualization, and implementation links.
Ongaro and Ousterhout: "In Search of an Understandable Consensus Algorithm" (USENIX ATC 2014) - the Raft paper.
Diego Ongaro's Raft PhD dissertation (Stanford, 2014) - deeper treatment including log compaction, membership changes, and operational concerns.
Lamport: "Paxos Made Simple" (2001) - Lamport's own restatement of Paxos; still terse but the canonical source.
Lamport: "Paxos Made Live" (Google, Chandra et al., 2007) - production experience running Paxos (Chubby).
Peter Bailis: "The Network is Reliable" (ACM Queue, 2014) - empirical case studies of real-world partition incidents. A concrete counterpoint to "the network is reliable."
Jepsen - independent distributed-system correctness analyses. The single best external source for "what goes wrong in practice."
Martin Kleppmann: "How to do distributed locking" (2016) - the canonical Redlock critique and the fencing-token argument.
Ongaro: Raft visualization - interactive simulator. Worth one hour if elections feel abstract.
Werner Vogels: "Eventually Consistent" (ACM Queue, 2008) - classical framing of the consistency trade-off.
Peter Deutsch: The Eight Fallacies of Distributed Computing - the original list, with commentary.

Use Rules

For every primary concept, the local book chunk is the escalation. Reach for DDIA (chapters 8 and 9) first; Coulouris for the formal treatment; Database Internals for the implementation view.
Open one chunk per gap. Do not drift into whole chapters.
External links are targeted: when a concept page says "external," it means the chunk does not cover that angle well enough.

Source Stack​

Resource Map by Cluster​

Cluster 1: The Inescapable Reality​

Cluster 2: Time, Clocks, and Ordering​

Cluster 3: Failure Detection and Membership​

Cluster 4: Consensus​

Cluster 5: Distributed System Patterns​

Exercise Support Chunks​

External Resources (Validated, Read If Pointed Here)​

Use Rules​