Skip to main content

The Eight Fallacies of Distributed Computing

What This Concept Is

In the mid-1990s, Peter Deutsch and James Gosling (at Sun Microsystems) enumerated assumptions that new distributed-systems engineers make and that are always wrong in practice. They called them fallacies because believing any of them quietly produces real bugs. The list is eight:

  1. The network is reliable.
  2. Latency is zero.
  3. Bandwidth is infinite.
  4. The network is secure.
  5. Topology doesn't change.
  6. There is one administrator.
  7. Transport cost is zero.
  8. The network is homogeneous.

Each fallacy corresponds to a class of code that "works on localhost" and fails in production. The list is not a checklist of threats. It is a list of implicit assumptions that bleed into APIs, timeouts, error handling, serialization, and deployment.

Why It Matters Here

Every later concept in this module exists because one or more of these fallacies is false. Logical clocks exist because latency is not zero. Consensus exists because the network is not reliable. Fencing tokens exist because topology (who is the leader, right now) changes. Idempotency keys exist because "transport cost is zero" and "the network is reliable" are both false: you will retry and you will sometimes deliver twice.

If you cannot name the fallacy a design is violating, you cannot predict how it will fail.

Concrete Example

Two services, A and B. A sends B an RPC: "charge the card for $100." A waits 200ms, sees no reply, and decides to retry. The concrete problems, one per fallacy:

  • Reliable: the request might have been dropped, or the reply might have been dropped. A does not know which.
  • Latency zero: 200ms was a guess. Maybe B's reply is in flight right now, and the retry produces a double charge.
  • Bandwidth: at Black Friday, the link fills up and adds 2s of queueing; your timeouts now fire constantly even though nothing is broken.
  • Secure: the retried request is a replay; without an idempotency key and authentication, a MITM can repeat it.
  • Topology: B was failed over to a new IP; your connection pool still points at the dead one until it refreshes.
  • One administrator: B's team deployed a new serialization format at 3am; your client panics on unknown fields.
  • Transport cost zero: payload went from 2KB to 200KB after a refactor; p99 latency doubled; retries pile up.
  • Homogeneous: B's cluster is half on Linux and half on Windows; a string comparison silently differs in case-folding.

Each fallacy is a different failure surface. They compound.

Common Confusion / Misconception

"The fallacies are old - modern networks are better." Modern networks are faster and more redundant, but none of them have made the fallacies false. Cloud datacenters still have partial network failures (a "grey failure" in Microsoft's vocabulary); Kubernetes pods still move; serializers still change; a malicious or misconfigured tenant still shares your fabric. The fallacies describe a class of assumption, not a quality level.

How To Use It

When you review a design or write a new one:

  1. For each RPC, ask which fallacy a naive implementation would rely on.
  2. For every timeout literal (timeout = 500ms), write next to it which fallacy it is guessing about.
  3. For every "just retry on failure," ask what happens if the side effect already occurred - that is fallacy #1 made visible.
  4. For every "the topology is stable," ask what changes during a deploy, a rolling restart, a failover, or a DNS TTL expiry.

Check Yourself

  1. Which two fallacies most directly motivate idempotency keys on write APIs?
  2. Why is "the network is homogeneous" still a practical fallacy in a single cloud region?
  3. Name one fallacy your most recent production service has quietly assumed.

Mini Drill or Application

Pick any production service you've used or built. Write one sentence per fallacy: either "this service handles it by X" or "this service implicitly assumes it." Be honest about the latter.

Read This Only If Stuck