Skip to main content

Learning Resources

This module is populated from the local chunked books in library/raw/semester-06-databases-distributed/books. Use this page as a source map, not as an instruction to read everything.

Source Stack

BookRoleHow to use it in this module
Designing Data-Intensive Applications (Kleppmann)Primary teaching sourceDefault escalation for every primary concept. Chapters 8 (The Trouble with Distributed Systems) and 9 (Consistency and Consensus) are the core chunks for this module
Distributed Systems Concepts and Design (Coulouris et al.)Canonical theory sourceFormal treatment of time, clocks, failure models, and consensus. Go here for the rigorous textbook view
Database Internals (Petrov)Implementation-level supportConcrete views of failure detection, gossip, ZAB, Paxos, Multi-Paxos, and Raft; most efficient for "how it is actually built"
Database System Concepts (Silberschatz et al.)PeripheralLimited direct coverage of these topics; used lightly

Resource Map by Cluster

Cluster 1: The Inescapable Reality

NeedBest local chunkWhy
Fallacies (origin and modern restatement)Database Internals: Fallacies of Distributed ComputingCleanest single-chunk list and commentary
Partial failure intuitionDDIA: Faults and Partial FailuresThe single best page on "a distributed system has partial failure"
Networks are unreliableDDIA: Unreliable NetworksConcrete patterns for how networks really fail
Cloud vs HPC failure cultureDDIA: Cloud Computing and SupercomputingWhere partial failure is a design choice
Timeouts and unbounded delayDDIA: Timeouts and Unbounded DelaysWhy no timeout distinguishes slow from dead
Async vs sync networksDDIA: Synchronous Versus Asynchronous NetworksFormal contrast; partial synchrony motivation
Process pausesDDIA: Process Pauses (Part 1), Part 2GC pauses as a distributed failure mode
Two generalsDatabase Internals: Two Generals' ProblemImpossibility motivator
System synchrony formalismDatabase Internals: System SynchronyAsync vs partial sync vs sync formalized
Coulouris challengesCoulouris: Challenges (Part 1), Part 2, Part 3Textbook framing of heterogeneity, failure, scaling

Cluster 2: Time, Clocks, and Ordering

NeedBest local chunkWhy
Why clocks are unreliableDDIA: Unreliable ClocksThe single best page on this topic
NTP accuracy in practiceDDIA: Clock Synchronization and AccuracyConcrete numbers and failure modes
Wall-clock pitfallsDDIA: Relying on Synchronized Clocks (Part 1), Part 2LWW traps and Spanner's TrueTime
Implementation-level timeDatabase Internals: Clocks and TimeMonotonic vs wall-clock at the system level
Physical clock synchronizationCoulouris: Synchronizing Physical Clocks (Part 1)Cristian's algorithm, Berkeley, NTP
Logical time (textbook)Coulouris: Logical Time and Logical ClocksLamport and vector clocks rigorously
Happens-before and causalityDDIA: Ordering and Causality (Part 1), Part 2Happens-before to causal consistency
Sequence number orderingDDIA: Sequence Number Ordering (Part 1)From Lamport clocks to leader-issued sequence numbers
Ordering (implementation)Database Internals: OrderingCompact implementation-level summary

Cluster 3: Failure Detection and Membership

NeedBest local chunkWhy
Failure detection overviewDatabase Internals: Chapter 9 - Failure DetectionBest introduction to the problem
Phi-accrualDatabase Internals: Phi-Accrual Failure DetectorThe core technique Cassandra and others use
Failure detection summaryDatabase Internals: Summary (Chapter 9)Compact wrap-up of detection protocols
Gossip disseminationDatabase Internals: Gossip DisseminationLogarithmic dissemination, SWIM-adjacent design
Hybrid gossipDatabase Internals: Hybrid GossipPerformance tuning real gossip
Anti-entropy primerDatabase Internals: Chapter 12 - Anti-entropy and DisseminationContext for gossip
Gossip (textbook)Coulouris: Gossip architecture (Part 1)Bayou-style gossip analysis
Omission faultsDatabase Internals: Omission FaultsThe failure model under gossip/heartbeat
Byzantine (DDIA)DDIA: Byzantine FaultsWhen and why BFT applies
System model and realityDDIA: System Model and RealityConnecting textbook models to real hardware
PBFT algorithmDatabase Internals: PBFT AlgorithmClassical BFT reference

Cluster 4: Consensus

NeedBest local chunkWhy
Why consensus is neededDDIA: Distributed Transactions and ConsensusThe problem statement across multiple use cases
Fault-tolerant consensusDDIA: Fault-Tolerant Consensus (Part 1), Part 2, Part 3Kleppmann's modern presentation of the consensus problem
Consensus chapterDatabase Internals: Chapter 14 - ConsensusImplementation-level entry point
PaxosDatabase Internals: PaxosClear basic Paxos
Paxos quorumsDatabase Internals: Quorums in PaxosQuorum intersection explained
Multi-PaxosDatabase Internals: Multi-PaxosFrom single-value to replicated log
Egalitarian PaxosDatabase Internals: Egalitarian PaxosWhere the leaderless variant fits
RaftDatabase Internals: RaftCore Raft exposition
Raft leader roleDatabase Internals: Leader Role in RaftOperational model of the leader
ZABDatabase Internals: ZABZooKeeper's atomic broadcast protocol
Consensus (textbook)Coulouris: Consensus and related problems (Part 1), Part 2, Part 3Formal problem definition, FLP, and Byzantine generals

Cluster 5: Distributed System Patterns

NeedBest local chunkWhy
Leader election introductionDatabase Internals: Chapter 10 - Leader ElectionCompact introduction to the problem
Bully algorithmDatabase Internals: Bully AlgorithmClassical leader-election reference
Majority as truthDDIA: The Truth Is Defined by the MajorityWhy single-leader designs must check quorums
Elections (Coulouris)Coulouris: Elections (Part 1)Formal election algorithm treatment
Idempotency contextDDIA: Summary (Chapter 8 Part 2)Wrap-up of the retry-and-duplicate story
End-to-end argumentDDIA: The End-to-End Argument for Databases (Part 1), Part 2Why exactly-once must live at the application layer
Coordination avoidanceDatabase Internals: Coordination AvoidanceWhen you can skip the coordination dance entirely
Coordination services (DDIA)DDIA: Membership and Coordination ServicesWhat ZooKeeper/etcd/Consul actually provide
Coordination services (Coulouris)Coulouris: Data storage and coordination services (Part 1), Part 2Google Chubby-style treatment

Exercise Support Chunks

Use these when concept pages are understood but fluency is weak:

External Resources (Validated, Read If Pointed Here)

The module links to specific external posts from concept pages. All validated as of the most recent curation pass.

Use Rules

  • For every primary concept, the local book chunk is the escalation. Reach for DDIA (chapters 8 and 9) first; Coulouris for the formal treatment; Database Internals for the implementation view.
  • Open one chunk per gap. Do not drift into whole chapters.
  • External links are targeted: when a concept page says "external," it means the chunk does not cover that angle well enough.