Paxos: The Canonical Single-Decree Algorithm
What This Concept Is
Paxos (Lamport, 1989; published 1998 as The Part-Time Parliament) is the canonical fault-tolerant consensus algorithm. Single-decree Paxos (sometimes called Basic Paxos) gets a set of nodes to agree on one value despite up to f crash failures in a cluster of 2f+1. Multi-Paxos extends it to a sequence of values - the replicated state machine - by amortizing the first phase across many decrees under a stable leader.
Three roles (a node can play several):
- Proposer: wants a value to be chosen; initiates the protocol.
- Acceptor: votes on proposals; a majority of acceptors is a quorum.
- Learner: learns the chosen value.
Basic Paxos is two phases:
Phase 1: Prepare
- Proposer picks a unique, monotonically increasing proposal number
n. - Proposer sends
Prepare(n)to a majority of acceptors. - Acceptor, on receiving
Prepare(n): ifn> anyn'it has promised, respond withPromise(n, (n_accepted, v_accepted))and never accept proposals numbered belown. If it has already accepted a valuev_acceptedat proposaln_accepted, it reports that tuple.
Phase 2: Accept
- Proposer gathers
Promisereplies from a majority. If any reported an accepted value, the proposer must propose the value from the highestn_acceptedamong the replies. (If none reported a value, proposer is free to propose its own.) - Proposer sends
Accept(n, v)to a majority. - Acceptor, on receiving
Accept(n, v): if it has not promised a highern', accept and record(n, v); replyAccepted(n, v). - Once a majority has accepted
(n, v),vis chosen. Learners can be told.
Safety argument (sketch): any two quorums intersect in at least one acceptor, so a value once chosen is "seen" by the next proposer's Promise replies and is preserved. Liveness requires a stable proposer (in Multi-Paxos, a stable leader) - otherwise dueling proposers can indefinitely increment n.
Why It Matters Here
Paxos is the canonical algorithm every distributed-systems engineer is expected to know conceptually. It underlies or inspired:
- Google Chubby (lock service, Multi-Paxos),
- Google Spanner (Paxos per Paxos group),
- Microsoft Azure Cosmos DB,
- the original ZooKeeper ZAB protocol,
- and a long line of research variants (Fast Paxos, Cheap Paxos, Egalitarian Paxos, Flexible Paxos).
You will likely never implement Paxos yourself, but you must be able to read the protocol and argue why it is safe - because any bug report or design review that mentions consensus assumes you can.
Concrete Example
Three acceptors A, B, C. Two proposers P1, P2 try to get different values chosen.
P2 wanted to propose "Y" but discovers via B's Promise that "X" was already (possibly) chosen. P2 must propagate "X". This is how safety is preserved despite P2 not having talked to A.
Multi-Paxos and the Leader Optimization
Running basic Paxos per decree costs two round-trips. In practice, systems elect a stable leader (itself via Paxos) and skip Phase 1 for subsequent decrees. The leader assigns decree numbers in order and only runs Phase 2 (one round-trip). If the leader crashes, the next leader runs Phase 1 once over its log tail to recover, then resumes fast path.
Common Confusion / Misconception
"Paxos is impossible to understand." It has a reputation, mostly because Lamport's original papers are dense and the terminology overlaps across variants. Once you internalize "proposal number monotonically increases, acceptor records the highest promise and the highest accept, majority intersects majority," the safety argument collapses to a few lines. The implementation difficulty is a separate issue.
A second misconception: "Paxos is slow." Single-decree Paxos requires two round-trips (Prepare + Accept). Multi-Paxos under stable leadership is one round-trip per decree - comparable to 2PC. The slowness reputation comes from its complexity to implement and from dueling-proposer livelock scenarios, not from a large round-trip count.
A third: "Paxos is obsolete; everyone uses Raft." Raft is Multi-Paxos with strong opinions (explicit leader, contiguous log, configurable membership changes). Google, Microsoft, and many systems still use Paxos-family algorithms. Raft is easier to teach, not more correct.
How To Use It
You will rarely implement Paxos. You will often read about systems that use it. When you do:
- Identify the quorum size (
f+1in2f+1). - Identify the proposer (is there a stable leader? How is it chosen?).
- Identify where the log tail is reconciled on leader change (this is where most bugs live).
- Identify the learner protocol (how do non-participants catch up?).
Check Yourself
- Why must proposal numbers be unique and monotonically increasing?
- What happens if a new proposer's Phase 1 finds a previously accepted value?
- Why does Multi-Paxos skip Phase 1 under a stable leader?
- What quorum size is needed to tolerate 2 crash failures?
Mini Drill or Application
Redraw the diagram above, but swap the order: P2 runs Prepare before P1. Work out what gets chosen. Then add a third proposer P3 with an even higher n but only a minority of Promise replies. Explain what must happen.