The CAP Intuition: Partition-Tolerance Is Mandatory
What This Concept Is
CAP (Consistency, Availability, Partition-tolerance) is often misquoted as "pick two of three." That phrasing is misleading. The honest statement:
- Partitions happen. In any distributed system, the network can drop or delay messages. You cannot opt out.
- During a partition, each side of the split faces a choice: either refuse to answer (giving up Availability on that side) or answer from possibly-stale data (giving up Consistency, specifically linearizability).
- Between partitions (steady state), this tradeoff does not apply -- both guarantees can be offered at the same time.
CAP is therefore a choice you make under failure, not a permanent property of the system. A better framing is PACELC: during a Partition, pick A or C; Else (normal operation), pick Latency or Consistency.
Why It Matters Here
Every replication topology in the next cluster lands somewhere on the CAP spectrum:
- Single-leader with synchronous replication leans CP: reads/writes from the minority side fail during a partition.
- Leaderless (Dynamo-style) with
R + W > Nleans AP: all sides accept writes during a partition; they reconcile afterwards via vector clocks or last-write-wins. - Multi-leader leans AP by default: all leaders accept writes and conflicts are resolved later.
Knowing CAP as a runtime choice, not a static label, lets you read a design and predict what it does on the minority side of a partition -- the very scenario where replicated systems go wrong.
Concrete Example
Two datacenters, DC-East and DC-West, run a banking application with a single-leader replica in each, replicated synchronously.
Steady state:
- Both DCs answer reads. Writes commit in both. Everything works.
Network partition between DCs (fiber cut):
- DC-East is the leader; DC-West is the follower.
- CP choice (strong consistency): DC-West refuses writes and refuses reads that require fresh data. Clients in DC-West get errors. Availability on that side is zero.
- AP choice (high availability): DC-West promotes itself and accepts writes. Now there are two leaders (split-brain). When the network heals, the database has two divergent histories of the same account balances. Reconciliation is a human problem.
There is no third door. CAP forces you to pick one of these two during the partition.
Common Confusion / Misconception
"CAP says you can have 2 of 3." Technically yes as a slogan, but "all three" is only impossible while a partition is in progress. The slogan leads designers to write down "we picked CP" on an architecture diagram as if that were a permanent tradeoff, and then they are surprised when the production behavior under no partition is slow -- because latency (the "L" of PACELC) was the real tradeoff all along.
"Eventual consistency means AP." Eventual consistency is a strategy for reconciling after a partition. AP is the CAP choice made during the partition. A leaderless AP system always needs an eventual-consistency story; an AP system is not necessarily leaderless.
"CAP is obsolete." It is not obsolete, but it is misunderstood. Modern systems like Spanner advertise CP; DynamoDB advertises AP (with tunable knobs). Both are correct about CAP and both describe real differences in behavior under failure.
How To Use It
When evaluating any replicated system, perform the partition thought experiment:
- Assume a two-way partition cuts your cluster cleanly in half.
- Ask: can the minority side still read? Still write?
- If reads succeed, are they allowed to be stale?
- If writes succeed on both sides, how are divergent versions merged when the partition heals?
Any system that cannot answer these four questions is not CAP-analyzed.
Check Yourself
- Why is partition-tolerance not really a choice in practice?
- Under a two-datacenter synchronous-replication setup, what does the minority side do during a partition, and what does "doing" mean there?
- Give a system that is CP during partition and latency-optimized otherwise (PACELC: PC/EL).
- Why is it misleading to describe a system as "CAP: AP" without saying what happens in steady state?
Mini Drill or Application
For each system, identify its CAP choice during a partition and its PACELC steady-state choice:
- Apache Cassandra with
QUORUMreads and writes on a 3-node cluster. - Google Spanner.
- Amazon DynamoDB with
eventually_consistentreads. - A PostgreSQL primary with a single synchronous replica in the same datacenter.
- MongoDB with
writeConcern: majorityandreadConcern: linearizable.