Skip to main content

Why Consensus Is Needed: Leader Election, Atomic Commit, Replicated State Machines

What This Concept Is

Consensus is the problem of getting a set of nodes to agree on a single value, despite some nodes failing. Formal requirements:

  • Agreement: no two correct processes decide different values.
  • Integrity: if a value is decided, it was proposed by some process.
  • Termination: every correct process eventually decides.
  • Validity: the decided value satisfies some input condition (varies).

Three canonical problems that reduce to consensus (i.e., solving one gives you a solution to the others):

  1. Leader election. A cluster needs exactly one leader at a time. "Who is the leader?" is a consensus on a process id.
  2. Atomic commit (2PC, 3PC, Paxos Commit). Multiple participants must all commit or all abort a transaction. "Commit or abort?" is a consensus on one bit.
  3. Replicated state machine (RSM). A log of commands is replicated across nodes so all produce the same state. The ith log entry is a consensus on the ith command. This is what Paxos and Raft give you when run repeatedly.

The FLP impossibility result (Fischer, Lynch, Paterson 1985) says: in a fully asynchronous system with even one process that may crash, no deterministic algorithm can solve consensus. The proof is roughly that an adversarial scheduler can always delay one critical message to keep the system oscillating between potential decisions.

Real systems escape FLP by assuming partial synchrony - the network eventually stabilizes enough (bounded message delay) for some period. Under this assumption, Paxos and Raft are safe always and live whenever the network is stable.

Why It Matters Here

The whole of Cluster 4 and 5 is the response to this reduction. Once you see that leader election, atomic commit, and replicated state machines all reduce to consensus, you realize you only need to study one algorithm well - you get the other problems for free, and every "simple" distributed coordination problem you will encounter (distributed lock, config store, queue leader, shard owner) is the same problem in disguise.

Concrete Example

A service has 5 replicas behind a load balancer. Only one replica should accept writes; the others must follow. The election of which replica is the write leader is a consensus question:

  • All 5 need to agree on exactly one identity.
  • They need to agree despite any 2 being down or partitioned.
  • They need to agree even though each has incomplete information about the others.

Without consensus: two replicas might each think they are leader (split-brain) and accept conflicting writes. With consensus (e.g., Raft elect): exactly one replica gets a majority vote in a term and is the unambiguous leader for that term. When the network reconvenes, the losing side discovers it is no longer leader and steps down.

Common Confusion / Misconception

"We can just use a shared database row to pick the leader." You have not eliminated consensus; you have outsourced it to the database, which must itself use consensus internally (or accept the split-brain risk during its own failover). The classic error is treating a single-node "coordinator" as the consensus answer, then hitting an outage when that node partitions.

A second misconception: "FLP says consensus is impossible; therefore production systems don't really do it." FLP says deterministic consensus is impossible in the fully asynchronous model. Production systems either (a) use randomization (unlikely to stall forever), (b) assume eventual synchrony (the usual Paxos/Raft case), or (c) add external oracles (physical clocks for Spanner). The impossibility is a statement about what you cannot avoid buying, not a proof you cannot ship.

A third: "Consensus and distributed transactions are the same." They are closely related - atomic commit reduces to consensus - but consensus usually deals with a single value across a fixed set of participants, while distributed transactions involve multiple databases coordinating across read and write sets. Paxos Commit (Gray & Lamport) makes the connection explicit.

How To Use It

When you encounter a problem that looks like "we need everyone to agree on X":

  1. Ask: is X a single value that the set of participants must all see as the same value?
  2. If yes, it is consensus. You should not roll your own. Use Raft (via etcd, ZooKeeper, Consul, or a library like hashicorp/raft).
  3. If you are attempting to avoid consensus with "eventual consistency" or "use a timestamp," explicitly write down the scenarios where this is wrong.
  4. For atomic commit across heterogeneous stores: understand that 2PC blocks on coordinator failure (Module 4, Concept 10). Paxos Commit and Raft-backed commit variants exist.

Check Yourself

  1. State the three safety properties of consensus.
  2. How does leader election reduce to consensus?
  3. What does FLP say, precisely, and what assumption do real systems make to escape it?
  4. Why is "use a shared database row" not an escape from the consensus problem?

Mini Drill or Application

Pick a distributed coordination problem from your experience or reading (sharded cache ownership, distributed cron, resource leader in Kubernetes). Explicitly frame it as a consensus problem: what is the value, who are the participants, what does "decide" mean?

Read This Only If Stuck