Physical vs Logical Clocks, and the Limits of NTP
What This Concept Is
Distributed systems use three distinct notions of time, and conflating them is one of the most reliable bug sources in the discipline.
- Wall-clock time (in Java,
System.currentTimeMillis(); in POSIX,gettimeofday,CLOCK_REALTIME) answers "what time is it?" It jumps forward and backward as NTP adjusts it, and it can leap seconds, DST, or backward on startup. Never use it for ordering events. - Monotonic time (
System.nanoTime(),CLOCK_MONOTONIC) answers "how much time has passed since some arbitrary point?" It is not tied to any wall calendar, but it never goes backward on the same process. Use it for measuring intervals (timeouts, latencies). - Logical time (Lamport clocks, vector clocks) answers "what is the causal order of events?" It is a counter, not a duration. Use it to order events across nodes.
NTP (Network Time Protocol) synchronizes wall-clock time across nodes by talking to time servers, estimating round-trip latency, and adjusting local drift. Under ordinary cloud conditions, NTP will keep nodes within roughly 10-100ms of each other. Under adverse conditions (firewall dropped NTP, saturated link, or a misbehaving time server), it can be off by seconds. Google's Spanner (with TrueTime) gets tighter bounds only by using GPS and atomic clocks in every datacenter.
Why It Matters Here
All three time abstractions will recur in the module:
- Wall-clock time is why last-write-wins by timestamp is incorrect under clock skew.
- Monotonic time is what your heartbeat and phi-accrual detectors actually measure (Cluster 3).
- Logical time is how Lamport and vector clocks (next two concepts) sidestep physical clocks entirely.
You cannot reason about ordering with wall-clock time in a distributed system. You must promote to logical time or prove a bound on skew (Spanner-style).
Concrete Example
Two nodes append events to a shared log. Each tags its event with System.currentTimeMillis(). Node A has a clock 300ms ahead of true; Node B has a clock 200ms behind true.
At real-time T, A writes event a tagged with T + 300ms. At real-time T + 50ms, B writes event b tagged with T - 150ms. If a reader merges by timestamp, it sees b before a. But b really happened after a. The wall-clock order is wrong by 500ms.
This is not a hypothetical. Cassandra's last-write-wins conflict resolution is timestamp-based, and Jepsen has documented lost updates under clock skew.
Common Confusion / Misconception
"Our servers run NTP, so their clocks agree." NTP reduces drift; it does not eliminate it. Typical steady-state skew is tens of milliseconds. During NTP restart, container cold-start, or a network path change, skew can briefly be seconds. Worse, NTP can step backwards when the offset is large, so a monotonic-looking interval in wall-clock time can suddenly include a negative jump.
A second misconception: "We can use System.nanoTime() across machines." You cannot. Monotonic time is meaningful only within one process on one machine. Comparing nanotimes across machines is meaningless. It is a duration, not a point.
A third: "Millisecond timestamps are fine for logging." For human debugging, usually yes. For ordering concurrent events correctness-critically, never.
How To Use It
Use this decision rule:
- Measuring a local interval (timeout, latency, throughput)? Use monotonic time.
- Recording an absolute moment for a human (log timestamp, billing)? Use wall-clock with tolerance and NTP.
- Ordering events across processes for correctness? Use logical time. Never wall-clock.
- Needing a wall-clock-derived ordering with a hard bound (e.g., external consistency in a global DB)? You are in Spanner territory; you need TrueTime-class infrastructure or you are pretending.
Check Yourself
- Give one class of bug caused by using wall-clock time to order events.
- Why does NTP not make wall-clock timestamps safe for ordering?
- What is the right metric to use for a timeout?
- Why is
nanoTime()not comparable across two machines?
Mini Drill or Application
Pick one class in your own codebase that reads System.currentTimeMillis() or time.time(). Classify its use: local interval, human-facing moment, or ordering decision. If it is the third, note whether the code is secretly wrong.