Idempotency, Deduplication, and the Exactly-Once Illusion
What This Concept Is
Idempotency: applying an operation twice has the same observable effect as applying it once. set_user_email(u, 'a@b.com') is idempotent. charge_card($10) is not -- two applications charge $20.
Deduplication: detecting that a given message is a duplicate of one already processed, and skipping it.
"Exactly-once": a loaded phrase. Strictly, exactly-once delivery is impossible in an asynchronous distributed system (there is always a failure window between "we processed it" and "we told the broker we processed it"). What is achievable is effectively exactly-once processing, through at-least-once delivery + idempotent consumers + deduplication.
The takeaway that matters most:
Regardless of what your broker claims, treat every message as potentially redelivered. Idempotency is the consumer's responsibility, and it is not optional.
Why It Matters Here
Every concept in this module generates duplicates:
- outbox relay (Concept 06) retries on crash -> duplicate events published
- Kafka at-least-once (Concept 09) delivers again on consumer crash -> duplicate events consumed
- saga retries (Concept 11) re-send commands on timeout -> duplicate operations attempted
- network retries anywhere -> duplicate HTTP calls
If your consumer is not idempotent, duplicates silently corrupt data: double billing, double shipping, triple welcome emails. The fix is at the consumer, not the broker.
Techniques
1. Naturally idempotent operations
Some operations are idempotent by design:
SEToperations (updates to a specific value):set_status(order, 'shipped')- DB
INSERT ... ON CONFLICT DO NOTHINGkeyed by a unique event ID PUTon a keyed resource- state-machine transitions that only advance in one direction
Prefer this style when you can.
2. Deduplication by event ID
Store every handled event_id in a dedup table (or cache). On receipt:
def handle(event):
if dedup.seen(event.event_id):
return # already processed
with db.transaction():
apply_effects(event)
dedup.mark_seen(event.event_id)
Two things must be true for this to work:
apply_effectsanddedup.mark_seenlive in one transaction (otherwise you reintroduce a dual-write bug inside the consumer)- the dedup store is durable and has a retention window at least as long as the broker's max redelivery window
Storage options: a DB table processed_events(event_id PK, processed_at), a Redis set with TTL, or an embedded store like RocksDB in a stream processor. Scale the retention to the possible delay; a TTL of "max outbox lag + max consumer outage + margin" is a good rule of thumb.
3. Idempotency keys on external APIs
When you are producing a side effect (charging a card, calling a third-party API), pass an idempotency key. Stripe, Shopify, Plaid, and most serious APIs accept one:
POST /charges
Idempotency-Key: saga_9f2a_step_charge_v1
{ "amount": 4299, "currency": "usd" }
The upstream API guarantees that the same key produces the same outcome. Generate the key deterministically from saga_id + step_name so that a retry produces the same key.
4. Conditional updates (optimistic concurrency)
For mutable state, use WHERE version = ? or conditional writes (DynamoDB ConditionExpression, ETags). Duplicate writes become no-ops because the version has already advanced.
5. Ordering + monotonic state
For state that can only move forward (order pending -> shipped -> delivered), the consumer ignores any event that would move it backward. Combined with per-partition ordering, this is often enough.
A Worked Idempotency Pattern
Consumer handling PaymentCaptured to update a customer ledger:
def on_payment_captured(event):
with db.transaction():
# 1. dedup by event_id
inserted = db.execute("""
INSERT INTO processed_events (event_id, processed_at)
VALUES (%s, now())
ON CONFLICT (event_id) DO NOTHING
RETURNING event_id
""", event.event_id)
if inserted is None:
return # duplicate, skip
# 2. apply real effect (idempotent with the dedup insert)
db.execute("""
INSERT INTO ledger (order_id, amount_cents, entry_type)
VALUES (%s, %s, 'credit')
""", event.order_id, event.amount_cents)
Both the dedup insert and the ledger insert commit together. A second delivery finds the event_id already present and returns.
Why "Exactly-Once" Is an Illusion
Producers, brokers, and consumers form an asynchronous chain. At each hop:
- producer writes to broker -> broker might ack after writing or before; producer might retry after crash
- broker delivers to consumer -> consumer might crash after processing but before committing offset
- consumer commits offset -> delivery attempt recorded in the broker, not in the consumer's downstream system
Even when Kafka supports "exactly-once semantics" (EOS) within Kafka (Kafka->Kafka with transactions), the moment you touch an external system -- a database, an HTTP API, an email -- you are back to at-least-once, because there is no atomic commit across Kafka and that external system. The solution is not stronger broker guarantees; it is idempotent consumers.
This is why the honest story is:
At-least-once delivery + idempotent processing = effectively exactly-once.
Anyone selling you anything stronger is usually selling you "exactly-once within a narrow boundary."
Common Confusion / Misconception
"Kafka's EOS means I do not need idempotency." Only if every downstream effect is also inside the same Kafka transaction (typical Kafka Streams topology). The moment you write to a DB, call an API, or send an email, you need idempotency.
"We retry forever, so eventually it works." Retries without idempotency cause multiple commits, not eventual success. Retry of a non-idempotent operation is a bug amplifier.
"We only see duplicates under failure, which is rare." You see duplicates whenever offsets commit late, consumers rebalance, the outbox relay restarts, or a producer retries after a transient broker blip. These happen daily at scale.
"Dedup cache is fine, TTL of 5 minutes." Too short. Set it to the longest redelivery horizon: broker retention for consumer-side dedup, plus worst-case outage. One hour is a reasonable floor for most systems.
"Ignore duplicates in code review; downstream will sort it out." Downstream is you, tomorrow, debugging double charges.
How To Use It
Per-consumer checklist:
- Is the effect naturally idempotent? If yes, great; document it.
- If not, pick a dedup key. Usually
event_id. Ensure producers always include it. - Where is the dedup store? DB table, Redis set, stream-processor state store.
- Is dedup in the same transaction as the effect? Must be.
- What is the dedup TTL? At least as long as worst-case redelivery.
- What happens to duplicates across operation types? (A
ReleaseStockretried after success should be harmless; add a status check.) - For external APIs, is an idempotency key available? Use it; generate deterministically from
saga_id + step_name.
Check Yourself
- Why does "at-least-once + idempotency" give the same observable behavior as "exactly-once"?
- What must be true about the dedup store and the effect's write for idempotency to hold?
- Name three naturally idempotent operations and one that fundamentally is not.
- What is the right TTL for a dedup cache, and why?
Mini Drill or Application
Take a non-idempotent consumer you can name (sends a welcome email, charges a card, creates a record). In 20 minutes:
- Pick a dedup key.
- Choose a dedup store with justified TTL.
- Write pseudocode for the handler with dedup and effect in one transaction.
- Identify any downstream call that also needs its own idempotency key (e.g., the email provider's API).
Transfer to Adjacent Domains
- Outbox (Concept 06). Outbox relay retries cause the duplicates that idempotency absorbs; the two patterns are engineered together, not independently.
- Payments / billing integrations. Stripe, Adyen, Shopify all expose
Idempotency-Keyheaders because they've learned the same lesson: duplicates are the norm, and the fix is on the caller side. Any financial integration should derive keys deterministically from saga step identity. - HTTP API design (S8M4). REST's
PUTis idempotent by contract;POSTis not. API designers build idempotency into their resource verbs -- the event-bus equivalent is idempotency in consumer handlers. Same principle, different layer. - Operational postmortems. "Double billing" and "duplicate email" postmortems trace to exactly one root cause 95% of the time: a consumer that assumed exactly-once delivery. Make idempotency a mandatory PR-review item for new consumers.
- Testing. Idempotency is fuzz-testable: deliver every event twice in integration tests; the resulting state must match the single-delivery state. If your tests don't exercise this, they don't exercise the production path.
Read This Only If Stuck
- Richards & Ford: Preventing Data Loss -- durability and delivery guarantees in async systems
- Richards & Ford: Event-Driven Architecture Style -- broker topology where redelivery is the default
- Richards & Ford: Data Collisions -- concurrent-update patterns that interact with consumer idempotency
- System Design Primer: Consistency patterns -- eventual-consistency setting that "effectively exactly-once" lives inside
- System Design Primer: Asynchronism -- the async substrate that produces duplicates by design
- Confluent: Exactly-once Semantics Are Possible -- Here's How Kafka Does It -- the honest explanation, including its boundaries
- Microservices.io: Idempotent consumer -- canonical pattern treatment
- AWS SQS: Visibility timeout and duplicates -- concrete guidance on duplicates in SQS
- AWS SQS: FIFO queues -- exactly-once processing semantics -- where the AWS platform offers its narrower "exactly-once" flavor and its limits
- Stripe: Idempotent requests (API reference) -- the canonical
Idempotency-KeyAPI contract, transferrable to any external side effect - Enterprise Integration Patterns: Idempotent Receiver -- the 2003 pattern name that this concept is the modern treatment of