Skip to main content

Event Sourcing: When It's Right, When It's Overkill

What This Concept Is

Event sourcing (ES) is a persistence style: instead of storing the current state of an aggregate, you store the ordered sequence of events that produced that state. Current state is derived by replaying events.

A write looks like:

append(events) to event-log-for-<aggregate-id>

A load looks like:

events = event-log-for-<aggregate-id>
state = fold(events, initial)

The log is the authoritative store. Everything else -- relational projections, search indices, snapshots -- is derived.

Shape of an event-sourced aggregate

Aggregate state  is the reduction of its event stream.
Commands produce new events.
Events, once appended, are immutable.
Versioning is per stream; concurrency is optimistic on "expected_version".
Snapshots are an optimization (not required).

What event sourcing actually gives you

  • Complete history. Every state change is audit-logged by construction.
  • Temporal queries. "What did this look like at 14:30 last Thursday?" -- fold events up to that timestamp.
  • Rebuilds. A bug in a projection? Replay events into a fixed projection.
  • Debuggability. Reproduce production bugs by replaying the exact event stream.
  • Behavioral insights. The event log is itself domain-meaningful data for BI.

What event sourcing actually costs you

  • Schema evolution. Old events are forever -- you must accept v1 payloads decades later. Upcasters.
  • Query side must be CQRS. You cannot SELECT * FROM shipments WHERE status = 'DELIVERED' from the event log directly; you need a projection.
  • Snapshot management if aggregates have long histories.
  • Operational maturity -- replay tooling, idempotent projections, stream storage.
  • Cognitive load -- many engineers have never worked this way.

ES is marked primary here because it is a fundamental concept every architect should understand and be able to decide on, not because every system needs it.

Why It Matters Here

Domain events (concept 11) are a perfect fit for event sourcing: the same events that cross context boundaries become the write log. CQRS (concept 13) describes the read side you will need. Aggregates (concept 10) become fold functions.

Knowing when not to event-source is as important as knowing when to. Misapplied ES is one of the more expensive mistakes in this module's topic area.

Concrete Example

Case: Parcel Shipping -- Tracking context uses event sourcing; Pricing does not

Why Tracking is a great fit

Tracking is fundamentally an event-driven subdomain: dozens of scan events per parcel, arriving out of order, from multiple external sources, needing to be deduplicated and reconciled. Audit and "what did we know when?" are common support questions.

The write store is simply an append-only journey_events table keyed by shipment_id:

CREATE TABLE journey_events (
shipment_id TEXT,
version BIGINT,
event_id UUID UNIQUE,
event_type TEXT,
payload_json JSONB,
occurred_at TIMESTAMPTZ,
recorded_at TIMESTAMPTZ DEFAULT now(),
PRIMARY KEY (shipment_id, version)
);

Load / handle a command:

class ShipmentJourneyES:
def __init__(self, shipment_id):
self.shipment_id = shipment_id
self.version = 0
self.status = "booked"
self.scans = []
self._pending = []

@classmethod
def load(cls, event_store, shipment_id):
inst = cls(shipment_id)
for row in event_store.read_stream(shipment_id):
inst._apply(row.event_type, row.payload_json)
inst.version = row.version
return inst

def record_scan(self, scan: CarrierScan):
if any(s.id == scan.id for s in self.scans):
return # idempotent on scan id
event = {
"event_id": str(uuid.uuid4()),
"event_type": "ScanRecorded",
"payload": {"scan_id": scan.id, "code": scan.code, "at": scan.at.isoformat(), "city": scan.city},
"occurred_at": scan.at.isoformat(),
}
self._pending.append(event)
self._apply(event["event_type"], event["payload"]) # update state in-memory

def _apply(self, event_type, payload):
if event_type == "ScanRecorded":
self.scans.append(_hydrate_scan(payload))
self._maybe_update_status(payload)
elif event_type == "DeliveryConfirmed":
self.status = "delivered"
# ...

Save:

class EventStore:
def append(self, shipment_id, events, expected_version):
with self._conn.tx():
self._assert_version(shipment_id, expected_version)
for i, e in enumerate(events):
self._conn.execute("""
INSERT INTO journey_events(shipment_id, version, event_id, event_type, payload_json, occurred_at)
VALUES (:sid, :ver, :eid, :et, :pl, :oa)
""", {...})
# optionally publish to outbox / bus

Projections (concept 13) build customer_tracking_view and ops_dashboard_view off this stream.

Temporal replay is trivial:

def journey_at(event_store, shipment_id, t: datetime):
state = ShipmentJourneyES(shipment_id)
for row in event_store.read_stream(shipment_id):
if row.occurred_at > t:
break
state._apply(row.event_type, row.payload_json)
return state

Why Pricing is a bad fit for event sourcing

Pricing holds rate snapshots and contract rules. Characteristics:

  • Writes are bursty -- new rate table published once a week
  • Queries are precomputed; each rate snapshot is read many times, rarely modified
  • Audit is handled by immutable rate-snapshot rows with an effective_from/to
  • There is no meaningful "event stream" -- "a new rate table was published" is one event per week

Forcing event sourcing on Pricing would be pure overhead. A classic relational model with versioned snapshot rows is a better fit.

Decision table

CharacteristicFavor ESAvoid ES
High-frequency state changes per aggregate
Strong audit or temporal-query requirement
Out-of-order or late-arriving facts
Event log is itself valuable to the business (BI, analytics, compliance)
Mostly CRUD, few events
Ad-hoc read queries that dominate writes❌ (projection cost high)
Team lacks ES operational experience
Tight deadline with no room for schema evolution thinking

If two to three ✅ apply and no ❌ blocks, ES is worth considering; otherwise, stick to the classical write store + CQRS if needed.

Event schema evolution -- the hidden cost

Year 1: ScanRecorded v1 { scan_id, code, at, city }

Year 2: you want iso_country. You do not alter v1 -- it is already in the log. You:

  • add ScanRecorded v2 { scan_id, code, at, city, iso_country }
  • the aggregate's _apply handles both v1 and v2 -- for v1, iso_country = None
  • new events emitted are v2
  • old v1 events forever remain v1 in the log; they read as v2 via an upcaster

Multiply this by 20 event types over 5 years and you see the discipline cost.

Common Confusion / Misconception

"Event sourcing = event-driven architecture." Event-driven is about communication; event sourcing is about persistence. You can do either without the other. They pair well but are independent decisions.

"I have an audit log so I already do event sourcing." An audit log is a side effect. In ES, the log is the source of truth. If deleting the audit log would still leave the system intact, you are not event-sourced.

"ES gives me exactly-once processing." It doesn't. Projections still need to be idempotent; consumers still need dedupe. ES just makes replay possible.

"I can delete events." In GDPR-land you sometimes must -- this is resolved via crypto-shredding (encrypt PII per subject, delete the key), not by mutating the log.

"ES is only for high-scale systems." Scale is orthogonal. A low-traffic system with strong audit requirements can be a perfect ES fit.

"ES aggregates can be loaded cheaply forever." No -- for very long streams you need snapshots: snapshot_at_version_N stored, then load snapshot + events after N.

"If we use Kafka, we're event-sourced." Kafka as message broker is one thing; Kafka as the write store of an aggregate is another. Many teams think they are event-sourced because they have Kafka; they are merely event-driven.

How To Use It

Decide per aggregate / context:

  1. Walk through the decision table above.
  2. If ES is warranted, choose an event store (EventStoreDB, Marten on Postgres, custom journey_events table).
  3. Design events to be business-meaningful facts, not method signatures. "ScanRecorded," not "UpdateScanCommandReceived."
  4. Plan snapshotting strategy up front (e.g., snapshot every 500 events or every 24h).
  5. Treat every projection as idempotent and replayable.
  6. Version events from day one; write an upcaster before you need one.
  7. Build a replay tool; test it quarterly.
  8. For contexts where ES is not right, stick with classical state storage. Revisit when the decision-table answers change.

Check Yourself

  1. Give one example of a bounded context that is well suited to ES and explain why.
  2. What problem do snapshots solve in an event-sourced system?
  3. Why can't you change a published event's schema in place?

Mini Drill or Application

For the Parcel Shipping Shipping context (bookings, labels, cancellations -- not Tracking):

  1. Apply the decision table and argue whether ES is appropriate.
  2. If yes, list the event types and sketch the fold function for Shipment.
  3. If no, describe an alternative that still captures the temporal/audit properties you need.
  4. In either case, write down one cost you would pay in year 3 for the decision.

Read This Only If Stuck