Skip to main content

Coordination Services: ZooKeeper, etcd, Consul

What This Concept Is

A coordination service is a small, consensus-backed data store whose job is to be the shared dependency that lets many unrelated services agree on coordination primitives. The three you will see:

  • ZooKeeper (Apache, 2008). Hierarchical namespace ("znodes") with ephemeral and sequential flavors. Uses ZAB (ZooKeeper Atomic Broadcast), a Paxos-family protocol. Powers Kafka, HBase, HDFS NameNode HA, and a generation of Hadoop-era systems.
  • etcd (CoreOS / CNCF, 2013). Flat key-value store with directory-like prefixes. Uses Raft. Powers Kubernetes, CoreDNS, Calico. The default "where is the cluster state?" for the modern container ecosystem.
  • Consul (HashiCorp, 2014). KV store with additional built-in service discovery, health checks, and DNS interface. Uses Raft for KV and Serf/SWIM for membership. Powers HashiCorp Nomad and many service meshes.

All three give you roughly the same primitives, with different ergonomics:

  • strongly consistent key-value store (reads after writes are linearizable on the leader);
  • ephemeral/session-bound keys (deleted when the client disconnects);
  • watches/notifications on key changes;
  • primitives for distributed locks, leader election, barriers, and queues, built from the above.

Why It Matters Here

You should almost never implement consensus yourself. Use a coordination service. The reason: a correct implementation requires careful attention to durability, fsync discipline, membership changes, snapshotting, compaction, leader lease semantics, linearizability of reads - each of which has caused real, severe bugs in naive rollouts. A production Raft implementation takes person-years; the existing coordination services have had a decade of battle-testing each.

This concept ties the module to real-world operations. When you see "leader election," "distributed lock," "config registry," "service discovery," or "feature flag rollout coordination," the answer is almost always "use a coordination service," not "write Paxos from scratch."

Concrete Example: Leader Election in etcd

Pseudo-code for an etcd-backed leader election in Go:

cli, _ := clientv3.New(clientv3.Config{Endpoints: endpoints})
session, _ := concurrency.NewSession(cli, concurrency.WithTTL(10))
election := concurrency.NewElection(session, "/my-service/leader")

// Campaign blocks until this node is leader
ctx := context.Background()
if err := election.Campaign(ctx, myNodeID); err != nil { /* ... */ }

// Observe the leader's term (a monotonic rev used as fencing)
leaderRev := election.Rev()

// Now I am leader. Every write to shared storage includes leaderRev.
for work := range jobs {
// Call storage.Write(work, fencingToken=leaderRev)
}

// When done or dying, resign cleanly
election.Resign(ctx)
session.Close()

Key points:

  • The session is an etcd lease; if this process dies or disconnects, etcd expires the session and the election object becomes eligible for someone else.
  • Campaign blocks; when it returns, you are leader.
  • Rev() is the etcd revision number when the leadership was acquired - this is your fencing token (Concept 13).
  • Every write to downstream storage carries that token, and storage rejects stale ones.

This is roughly 15 lines and is correct if storage respects the token. Writing Raft for this would be thousands of lines and a maintenance liability.

ZooKeeper, etcd, Consul - How to Pick

NeedBest fitWhy
Kubernetes ecosystem, container-nativeetcdIt is already there; Kubernetes API server writes to it
Kafka, Hadoop, or legacy JVM stackZooKeeperExisting integrations; hierarchical namespace useful for Kafka
Service discovery + KV + health checks out of the boxConsulBatteries included; multi-datacenter support is strong
Lightweight single-cluster coordination, minimal depsetcdSmallest operational footprint; simple API
Cross-datacenter with federationConsulDesigned for WAN federation from day one

All three solve the consensus problem. The choice is about ecosystem, operational footprint, and auxiliary features.

Common Confusion / Misconception

"A coordination service is a database." It is a tiny database, by design. Typical sizes: under 8 GB of data, thousands to tens of thousands of keys, writes up to a few thousand per second. If your data volume is larger, you are misusing the tool - the consensus path becomes the bottleneck.

A second misconception: "Reads from a coordination service are always linearizable." In etcd, reads go through the Raft log by default (linearizable), but you can opt into "serializable" reads from a local follower (stale). ZooKeeper's local reads are likewise sequentially consistent, not linearizable. Know which mode your client is using.

A third: "Watches are reliable." Watches are best-effort notifications with at-least-once semantics. After a disconnect, you must re-read the key to resynchronize. Treat watches as hints, not as a primary data channel.

How To Use It

When you have a coordination problem:

  1. Name the primitive you need: leader election, lock, barrier, config registry, service discovery, feature-flag rollout.
  2. Pick the coordination service that is already in your ecosystem (Kubernetes -> etcd; Kafka -> ZooKeeper; Consul-using shop -> Consul).
  3. Use the library abstractions (concurrency.NewElection in etcd; Curator's LeaderSelector in ZooKeeper; Consul's KV + semaphore helpers). Do not re-derive the primitive from raw CAS operations.
  4. Apply fencing on every downstream write. The coordination service gives you a monotonic revision/epoch; use it.
  5. Keep the data small. The coordination service is not your database.

Check Yourself

  1. Why should you not implement Raft yourself for a distributed-lock feature?
  2. What does an etcd session lease give you that a plain key does not?
  3. What is the right data-size budget for a coordination service?
  4. Why are watches not a substitute for re-reading state on reconnect?

Mini Drill or Application

Pick a real coordination problem from your experience (or imagine one: "exactly one worker runs this migration at a time"). Sketch it with etcd primitives: which keys, what TTL, how is the fencing token used, what happens on disconnect.

Read This Only If Stuck