Skip to main content

Single-Leader Replication

What This Concept Is

Single-leader replication (also called primary-replica, master-slave, active-passive) is the default replication topology for almost every relational database in production. It has four parts:

  • One leader accepts all writes.
  • Several followers receive the leader's write log and apply it in order.
  • Clients direct writes to the leader and reads to any node (often the leader for strong reads, or any follower for scale).
  • A replication log is the protocol between leader and follower (statement, logical, or physical -- covered in Concept 07).

The invariant is that the leader's log is the single authoritative sequence of state changes. Followers lag the leader by an amount that varies from microseconds (sync, same rack) to seconds (async, cross-region).

          writes            reads
---------> <--------
Client Leader Client
| | |
repl log | | | repl log
v v v
F1 F2 F3 (followers)
^ ^ ^
| | |
read scaling / geo-local reads

Why It Matters Here

Single-leader is the baseline every other topology is compared against. Multi-leader is "what if we had two of these?" Leaderless is "what if no node is special?" To compare, you must first understand what single-leader guarantees and where it fails:

  • It avoids write conflicts by funneling all writes through one node. Concurrency control is one machine's problem.
  • It sacrifices availability for writes: if the leader dies, writes stop until failover finishes.
  • It sacrifices latency for strong reads: writes from New York replicate to Sydney before a Sydney reader can see them.

Concrete Example

A typical web-app PostgreSQL setup in us-east-1:

  • db-primary on one machine, two streaming replicas (db-replica-1, db-replica-2).
  • Writes go to db-primary via pgBouncer.
  • Read replicas serve dashboards and reports.
  • WAL is streamed asynchronously; typical lag is 10-50 ms.

When db-primary crashes, the load balancer routes new connections to db-replica-1 (now promoted). Writes in the 10-50 ms not yet replicated are lost unless the replica was synchronous.

Failover flow:

    T=0s   primary crashes (hardware or kernel panic)
T=1s health check fails, alert fires
T=3s failover controller picks highest-LSN replica
T=5s chosen replica promoted: read-only -> read-write
T=6s DNS / proxy points writes at the new leader
T=6s+ application resumes writing (possibly losing 1-2 seconds of un-replicated writes)

Common Confusion / Misconception

"Single-leader gives me strong consistency." It does, but only if reads go to the leader. Reads from a follower can be seconds stale (replication lag), which creates application-level bugs (read-your-writes anomalies -- Concept 09).

"Followers are hot backups." They are live read-serving nodes, not backups. A backup is a frozen dataset you can restore from. A follower replays everything the leader does, including DROP TABLE. A runaway leader destroys all followers with it.

"Failover is automatic." Only if you built (or bought) the tooling. Raw PostgreSQL ships without automatic failover; repmgr, Patroni, or AWS RDS provide it. Without it, someone has to recognize the failure, pick a replica, and run pg_promote.

How To Use It

Pick single-leader replication when:

  1. Writes are dominated by one region (or tolerate writing to that region).
  2. You value read scaling and failover more than global low-latency writes.
  3. You can tolerate asynchronous replication (acceptable data loss on failover) or pay the latency for synchronous.
  4. The application does not need writes during leader-election windows (typically 5-30 s).

Avoid single-leader when active writers are spread across continents and they cannot tolerate cross-continent write latency. That is the multi-leader use case.

Check Yourself

  1. Name two things single-leader replication gives you for free and two things it costs you.
  2. What exactly fails during a leader failover window, and what still works?
  3. Why is "promote a follower" not the same as "restore a backup"?
  4. A follower is 200 ms behind the leader. A client writes X=5 to the leader and then reads from a follower. What can they observe?

Mini Drill or Application

For each setup, diagnose what fails and why:

  1. A single-leader, single-async-replica Postgres in one AZ. The AZ loses power. What is the data-loss window?
  2. Four followers, all async, reading balanced via round-robin. A user posts a comment and refreshes. Sometimes the comment appears, sometimes not. Explain.
  3. The leader is overloaded to 100% CPU. Followers appear healthy. Why do reads suddenly return errors?
  4. A follower has been disconnected for 2 hours. What happens when it reconnects?

Read This Only If Stuck