Request Routing: Gossip, Service Discovery, Coordinators
What This Concept Is
Once data is partitioned across many nodes, a client that wants to read or write key K has to find out which node currently owns K. The industry has converged on three approaches:
- Client-side routing: the client holds a partition map and computes the target node directly. Low latency, zero extra hops; requires clients to refresh their map when topology changes.
- Coordinator / routing tier: clients talk to any coordinator node (or a dedicated proxy like Vitess or ProxySQL). The coordinator consults the partition map and forwards to the right node. One extra hop; topology changes only update the coordinators.
- Gossip-based self-routing: every node knows the topology; the client connects to any node, which forwards if needed. Used by Cassandra and other leaderless systems. No dedicated routing tier needed.
The partition map itself has to be maintained somewhere with strong consistency (you cannot have two views of "who owns partition 17"). The usual answer is a coordination service:
- ZooKeeper / etcd / Consul: small strongly-consistent KV stores built on a consensus algorithm (ZAB or Raft). Hold the partition map, service discovery data, and leader-election state. Nodes watch the KV for changes and react.
Client-side: Coordinator: Gossip:
Client ---- computes hash Client Client
| consults map | |
v v v
node-X Coordinator ---> correct any node
(has map) node \
+--> forwards
Why It Matters Here
Without routing, partitioning is invisible to the application and undiagnosed to operators. Routing is where the abstraction is implemented and where it leaks.
Routing choices are the difference between:
- A smooth rolling deploy vs. a wave of connection errors when the partition map lags.
- "Client upgrade required" when topology changes (client-side) vs. transparent topology evolution (coordinator).
- Traffic that degrades gracefully when a node dies (gossip) vs. a cluster that stalls waiting for its routing service.
Concrete Example
Three real systems, three routing approaches:
- Redis Cluster: client-side routing. The client library caches a slot-to-node map. On
MOVEDorASKresponses (which indicate a stale map), the client refreshes from any node. Zero-hop in steady state. - Vitess (MySQL sharding): coordinator-based. The application talks to
vtgate, which holds theVSchema(shard map).vtgateroutes queries, merges fan-out results, and hides the underlying shards. - Cassandra: gossip-based self-routing. The client connects to any contact point; that node becomes the coordinator for this request. It knows the token ring and forwards reads/writes to the replicas that own the partition key.
Backing these up:
- MongoDB uses a coordinator (
mongos) with a config-server replica set (a mini consensus cluster) holding the shard map. - Kafka uses ZooKeeper (historically) or KRaft (newer) to track broker/partition ownership; producers and consumers consult this metadata.
Common Confusion / Misconception
"Client-side routing is fastest, so it's always best." Client libraries become a distributed dependency: every language binding must agree on the hashing rule, handle topology changes, and recover from stale maps. A coordinator tier centralizes those problems at the cost of one extra hop.
"Gossip eventually converges, so it is fine for routing." Gossip can be seconds out of date. For a node that just died, requests will be routed to it until gossip notices. Systems tolerating this failure window (AP systems) can use gossip; systems needing strict correctness cannot.
"ZooKeeper is the solution." ZooKeeper is a strongly-consistent small KV store. Putting the partition map there is fine; putting every write's routing decision there is a scalability anti-pattern. Keep ZK for metadata, not hot-path queries.
How To Use It
When reading an unfamiliar distributed database, ask:
- Who holds the partition map? (client, coordinator, or every node via gossip?)
- What is the propagation delay of topology changes?
- What happens when a client has a stale map -- error, redirect, or silent misroute?
- How is the map itself made consistent? (ZooKeeper/etcd/Raft service)
- What happens if the coordination service becomes unavailable -- can existing clients still route?
Config source of truth: ZooKeeper / etcd / Raft group
|
partition map (read + watch for changes)
|
+---------------------+---------------------+
v v v
client-side coordinator gossip nodes
Check Yourself
- What is the core responsibility a coordinator takes on, and what does it cost?
- Why does client-side routing require a propagation mechanism for topology changes?
- Why is ZooKeeper better suited for "store the partition map" than for "route every read"?
- When gossip reports a node as alive but it is actually hung, what happens to requests routed there?
Mini Drill or Application
Pick a routing scheme for each system and name the metadata store:
- A new internal platform storing 20 TB of key-value pairs across 50 nodes.
- A public multi-tenant search API serving thousands of apps.
- A Cassandra-like telemetry store with strict p99 read latency requirements.
- A sharded MySQL application migrating from a single master.
- An event-streaming platform with millions of partitions.