Concurrency, Consistency, and Transaction Boundaries

What This Concept Is

Every real system has moments where two operations race to touch the same data. This concept is about naming those moments explicitly and deciding, for each one:

Concurrency: how do concurrent writes interact? Last-writer-wins? Optimistic concurrency with a version? Pessimistic lock?
Consistency: what does a subsequent read see? Strict (read-your-write) or eventual (bounded staleness)?
Transaction boundary: which set of writes must succeed or fail together? Where is the atomicity line?

These three decisions are the correctness spine of the design. Any box that touches mutable state needs explicit answers.

The trade-off comes from CAP-style reality: strong consistency across replicas costs availability or latency; strong ordering across partitions costs throughput; strong atomicity across services costs coupling.

A more precise model: Jepsen's consistency-model hierarchy. Strict serializable -> serializable -> snapshot isolation -> read committed -> read uncommitted on the transactional side; linearizable -> sequential -> causal -> eventual on the replication side. Each step down the hierarchy buys something (lower latency, higher availability under partition, cheaper implementation) and gives up something (new anomalies you now have to reason about). Candidates who say "we need consistency" without picking a level on this ladder are inviting the reviewer to pick one for them -- and the pick will always be the strictest level their application cannot actually afford.

Why It Matters Here

If you leave these decisions implicit, you are implicitly choosing "last writer wins, eventual, no transaction" -- which is correct for some workloads and disastrous for others.

Money / identity / auth / inventory: require strong consistency and transactional writes.
Feeds / counters / search indexes / notifications: tolerate eventual consistency.
Leaderboards / analytics aggregates: tolerate bounded staleness and even approximate counts.

The Cluster 4 failure walk will ask "what happens when two users race on the same resource?" If you have not decided, the answer is "I do not know".

Concrete Example: Seat Booking

Workload: a seat booking service for a concert.

Functional: a user reserves seat S for show X. Only one user may hold a given seat at a given time.
Non-functional: under load, 10 K users press "buy" at 10:00:00.000 for a 10-seat show.
Constraint: cannot sell the same seat twice; this is the only inviolable rule.

Concurrency and transaction decisions:

Concurrency model: optimistic concurrency on the seat row with a version column.
- UPDATE seats SET held_by = :user, version = version + 1 WHERE seat_id = :s AND version = :v;
- If UPDATE affects zero rows, someone else won; return "seat taken".
Transaction boundary: the seat hold, the reservation record, and the charge hold live in one database transaction (or one distributed saga with compensations, depending on architecture).
Consistency: strict read-your-write on the seat inventory; the user's next request must see the hold.

Alternative approach: distributed lock on seat:X:S via Redis with a short TTL, protecting a two-step DB write. Acceptable if the lock service is itself highly available and you are comfortable with the failure modes of lock leases.

Compare to a feed write workload:

"Alice posted" and "Alice posted" again within 1 ms.
Concurrency: last-writer-wins is fine; both posts are independent.
Transaction boundary: inserting the post is one row, one table, one transaction. No cross-service transaction needed.
Consistency: eventual to followers is fine; Alice must see her own post immediately (read-your-write on her own timeline only).

Compare to a counter (post like count):

10 K users click "like" simultaneously.
Concurrency: use an atomic counter (Redis INCR, Cassandra counter column, or append-only event log with async aggregation).
Transaction boundary: the like event is a single append; the counter view is eventually consistent.
Consistency: eventual; off by a few seconds is acceptable.

A single system has all three patterns at once. Part of the deep-dive is naming which pattern applies to which table.

Concrete Example 2: E-Commerce Checkout Saga

Checkout spans four services: Cart, Inventory, Payment, Order. No 2PC across them -- instead, a saga with compensating actions:

Step 1  Cart freeze           compensate: unfreeze cart
Step 2  Inventory decrement   compensate: re-increment inventory
Step 3  Payment authorize     compensate: void authorization
Step 4  Order create          compensate: cancel order (if Payment capture fails later)

Consistency contract per step:

Inventory decrement uses optimistic concurrency with a version column, under a local (per-service) transaction. Two shoppers racing for the last unit: one succeeds; the other's saga compensates.
Payment authorize is an external call to a payment provider; the local transaction records "authorize attempted with id X", and the provider's response drives the next step. Idempotency key required -- retries must not double-charge.
Order create is the durability point for the user-visible record. After this, the saga has succeeded; any downstream failure (email confirmation, analytics) is eventually-consistent.

What this illustrates that the seat-booking example does not: the transaction boundary crosses services. There is no "one database transaction" option. Compensating actions are the substitute for rollback, and they must be designed as first-class functionality -- not as afterthought error-handling code.

Choose-your-consistency ladder for these four operations: strict (inventory, payment -- money and supply) -> strong local transactional (order create) -> eventual (confirmation email, analytics). The consistency level falls off with distance from money.

Common Confusion / Misconceptions

"Strong consistency is always safer." It is safer for correctness, costlier for latency and availability. Cross-region strong consistency means synchronous writes across continents; P99s rise, and a regional partition blocks writes everywhere.

"Transactions are a DB feature." Transactions are a promise. They can live in one DB, across DBs (2PC, rarely a good idea), or across services (sagas, event-driven compensations). The question is the atomicity contract, not the technology.

"Optimistic concurrency is only for low-contention." Optimistic concurrency performs well even under heavy contention if retries are quick and conflict detection is cheap. It becomes a problem only when the retry storm itself saturates the system.

"Eventually consistent means wrong." No. It means consistent after some bounded time with no further writes. The question is whether the application and the user can tolerate that bound.

"Exactly-once is a real thing." End-to-end exactly-once does not exist; at best, you have effectively-once via idempotency keys and at-least-once delivery plus de-duplication on the consumer. Say this out loud; do not promise what you cannot deliver.

"Serializable == linearizable." Different concepts. Serializable is about transactions (they appear to run in some serial order). Linearizable is about single operations on a single object (they appear instantaneous and ordered by real time). A DB can be one without the other; Spanner claims both.

How To Use It

Per mutable table, write a three-line contract:

Atomicity: what set of operations is atomic? Name the boundary.
Concurrency control: how are conflicting writes resolved? (LWW, OCC, PCC, CRDT, append-only log.)
Consistency: what can a subsequent read observe? (read-your-write, monotonic, bounded-staleness, eventual.)

On the diagram, mark the critical regions. Any arrow that crosses a strong-consistency line is a latency and availability cost you are paying on purpose.

Transfer / Where This Shows Up Later

Cluster 4 concept 11 (failure walk) tests what happens to these contracts during partition and replica failure.
Cluster 5 concept 14 (trade-offs) makes the consistency choice defensible as "chose X over Y because Z, accepting cost W".
S8M2 (microservices) and S8M3 (data patterns) rely on sagas, outbox pattern, and idempotency keys as everyday tools.
S9 (cloud) maps this to DynamoDB conditional writes, Spanner serializable transactions, Aurora global, Cosmos DB consistency levels.
S10 capstone/interviews: "what happens if two writes race?" is the most common correctness probe in senior interviews. Having a per-table contract on the board is how you answer it in one sentence instead of improvising.

Check Yourself

Name one workload where last-writer-wins is obviously correct, and one where it is obviously wrong.
Why is "read-your-write" a meaningful guarantee even in an eventually consistent system?
What is the difference between a saga and a distributed transaction, and why do we usually prefer the saga?
Where on the Jepsen consistency ladder would you put "bounded staleness within 5 seconds"? Defend.

Mini Drill or Application

For each workload, produce the three-line contract per mutable table, in under ten minutes total:

Seat booking for events (above).
Inventory decrement for an e-commerce checkout.
Unread-count per user in a chat system.
Profile-photo update propagating to a social feed.

For each, explicitly name the alternative you rejected and why.

Read This Only If Stuck

System Design Primer: CAP theorem -- short, correct framing for partition behaviour.
System Design Primer: Consistency patterns -- strict, eventual, weak; read/write guarantees.
System Design Primer: Availability patterns -- consistency/availability trade-off under fail-over.
System Design Primer: Database RDBMS replication -- master-slave, master-master, and what each guarantees.
System Design Primer: Asynchronism -- the standard escape hatch for strict consistency.
Fundamentals: Analyzing trade-offs -- language for "we accept eventual consistency because…".
Jepsen -- Consistency Models reference -- the canonical map of what each consistency level formally guarantees and which anomalies it forbids.
Martin Fowler -- Catalog of Patterns of Distributed Systems -- Version Vector, Two-Phase Commit, Lease, Idempotent Receiver, Replicated Log.
Amazon Builders' Library -- Timeouts, retries, and backoff with jitter -- why idempotency keys are mandatory for safe retry under at-least-once delivery.

What This Concept Is​

Why It Matters Here​

Concrete Example: Seat Booking​

Concrete Example 2: E-Commerce Checkout Saga​

Common Confusion / Misconceptions​

How To Use It​

Transfer / Where This Shows Up Later​

Check Yourself​

Mini Drill or Application​

Read This Only If Stuck​