Skip to main content

Module 3: Event-Driven Architecture

Primary texts: Fundamentals of Software Architecture (Richards & Ford, Ch. 14) and Designing Event-Driven Systems (Stopford) Selective support: System Design Primer (asynchronism, communication), Microservices Patterns (saga, outbox, CQRS), DDIA stream-processing chapters from Semester 6

This guide is the primary teacher. You do not need to read the source books front-to-back to complete this module. You do need to become fluent at reasoning about a system where events -- not requests -- are the unit of truth and coordination.


Scope of This Module

This module is not "how to run Kafka." It is where communication flips from synchronous requests to asynchronous facts, and where coordination stops being a remote procedure call and starts being a conversation between services about things that already happened.

What it covers in depth:

  • events as immutable facts about the past, distinguished from commands and queries
  • the mental shift from CRUD tables to event streams as the system of record
  • publish-subscribe vs point-to-point queues and when each makes sense
  • event notification vs event-carried state transfer and the consistency tradeoff
  • the outbox pattern and why database-plus-broker dual writes fail without it
  • queue-based brokers (JMS, AMQP, SQS) vs log-based brokers (Kafka, Pulsar)
  • consumer groups, partitions, ordering guarantees, and what "exactly once" really buys you
  • choreography vs orchestration for multi-service workflows
  • sagas with compensating transactions for long-running business processes
  • idempotency and deduplication as the consumer-side price of at-least-once delivery
  • event sourcing: the append-only log as the canonical record of state
  • projections and read models built by folding the log
  • CQRS and when separating the write and read sides actually helps

What it deliberately does not try to finish here:

  • operating a Kafka cluster in production (SRE-level tuning, multi-region replication)
  • every enterprise integration pattern (see Hohpe & Woolf for the full catalog)
  • stream-processing frameworks (Flink, Kafka Streams) at the implementation level
  • advanced event-modeling workshops (full domain storming)

This is a reasoning module, not a syntax module. If you can publish to Kafka but cannot explain why a command disguised as an event will rot your system, you are not done.


Before You Start

Answer these closed-book before starting the main path:

  1. What is the difference between "the user clicked Buy" and "charge the user's card"?
  2. Why does at-least-once delivery force consumers to be idempotent?
  3. If two services need the same data, what are two fundamentally different ways to keep them in sync?
  4. Why is writing to a database and then publishing to a broker in the same function a bug waiting to happen?
  5. What does a saga give you that a 2PC transaction across services does not?

Diagnostic Interpretation

4-5 solid answers

  • You are ready for the full path.

2-3 solid answers

  • Continue, but expect extra time in Clusters 2 and 4 (outbox and saga).

0-1 solid answers

  • Revisit Semester 7 Module 2 (Cluster 3: event-driven topologies) and Semester 6 Module 3 (replication and log-based thinking) before continuing.

What This Module Is For

Event-driven architecture is the communication style that actually scales across team boundaries, not just machines. It shows up any time the real question is:

  • multiple systems care about the same thing happening -- who tells them?
  • a workflow crosses service boundaries and no one service owns it end-to-end
  • the write path and the read path have wildly different performance needs
  • you need an audit trail that is not a bolted-on log
  • your transactions are longer-running than any single request can hold open

This module builds the reasoning you need for:

  • designing microservice boundaries that do not degenerate into distributed monoliths
  • choosing a messaging substrate (queue vs log) for the real workload
  • writing services that survive duplicates, reordering, and replays
  • explaining eventual consistency to product managers without hand-waving
  • deciding when event sourcing and CQRS earn their complexity -- and when they do not

You are learning to design for facts, not for calls.


Concept Map


How To Use This Module

Work in order. Later clusters only make sense if the earlier vocabulary is stable.

Cluster 1: Events as a Mental Model

OrderConceptTypeFocus
1An Event Is an Immutable Fact About the PastPRIMARYThe one-sentence definition, past tense, named by what happened
2Events vs Commands vs RequestsPRIMARYThree message intents and why mixing them rots systems
3The Shift from CRUD to EventsPRIMARYWhy "last write wins" is a modeling choice, not a law

Cluster mastery check: Can you rename three "update" operations in a CRUD system as past-tense events without losing information?

Cluster 2: Messaging Patterns

OrderConceptTypeFocus
4Publish-Subscribe vs Point-to-Point QueuesPRIMARYBroadcast vs work distribution; one consumer vs many
5Event Notification vs Event-Carried State TransferPRIMARYThin events that trigger lookups vs fat events that carry payload
6The Outbox Pattern: Atomically Publishing EventsPRIMARYKilling the dual-write bug with a transactional outbox

Cluster mastery check: Can you pick pub-sub vs queue for a given scenario and defend it, and can you explain why publishing inside a DB transaction is a different failure mode than publishing after commit?

Cluster 3: Brokers and Log-Based Systems

OrderConceptTypeFocus
7Queue Semantics: JMS, AMQP, SQSPRIMARYClassical brokers, delivery acknowledgements, and DLQs
8Log-Based Brokers: Kafka's Design and RetentionPRIMARYImmutable partitioned log, offsets, retention as a design tool
9Consumer Groups, Partitions, Ordering GuaranteesPRIMARYPer-partition ordering, rebalancing, and key-based routing

Cluster mastery check: Can you draw a Kafka topic with three partitions and two consumers in a group, and explain exactly which consumer gets which messages and why?

Cluster 4: Distributed Workflow with Events

OrderConceptTypeFocus
10Choreography vs OrchestrationPRIMARYWho drives the workflow: each service, or a central coordinator?
11Sagas: Long-Running Transactions Across ServicesPRIMARYCompensating transactions instead of distributed commit
12Idempotency, Deduplication, and the Exactly-Once IllusionPRIMARYWhy "at-least-once + idempotent" is the only honest story

Cluster mastery check: Can you walk a checkout saga through both a happy path and a failed-payment compensation, and can you explain why the inventory service must be idempotent even if the broker promises "exactly once"?

Cluster 5: Event Sourcing and CQRS

OrderConceptTypeFocus
13Event Sourcing: The Event Log Is the System of RecordPRIMARYState as a fold over an append-only log
14Projections and Read ModelsPRIMARYViews derived from the log, rebuildable on demand
15CQRS: When to Separate Reads and WritesSUPPORTINGSplitting the write model from the read model, and when not to

Cluster mastery check: Can you name three situations where event sourcing is the wrong answer, and one where CQRS without event sourcing is still a valid design?

Then work these practice pages:

OrderPractice pathFocus
1Event Modeling LabNaming events, avoiding disguised commands, storming a domain
2Messaging Patterns WorkshopPub-sub vs queue, notification vs state transfer, outbox sketch
3Saga and Idempotency ClinicCompensations, retries, dedup keys, ordering
4Event-Driven KatasRepeatable drills for event design, outbox, choreography, CQRS

Use Module Quiz after the concept and practice path. Use Reference and Selective Reading and Learning Resources only for targeted reinforcement.


Learning Objectives

By the end of this module you should be able to:

  1. Write an event name and payload that is an immutable past fact and resist the disguised-command trap.
  2. Choose pub-sub vs point-to-point queueing for a concrete scenario and justify it in one paragraph.
  3. Explain the dual-write bug and implement the outbox pattern to eliminate it.
  4. Compare queue-based and log-based brokers by delivery, retention, and replay semantics.
  5. Draw partitions and consumer groups on a whiteboard and predict who gets which messages.
  6. Choose choreography or orchestration for a given workflow and defend the tradeoff.
  7. Design a three-step saga with correct compensating transactions for the failure paths.
  8. Make a consumer idempotent using a dedup key, and explain why "exactly once" is a systems-level property not a broker feature.
  9. Decide whether event sourcing, CQRS, or neither is justified for a given bounded context.
  10. Rebuild a read model from an event log and explain why that rebuildability is the point.

Outputs

  • one event catalog with at least 20 events named in past tense, each with payload fields and a producing service
  • one architecture sketch comparing queue and log-based brokers for a shared scenario
  • one outbox implementation sketch (schema, polling query, idempotency key flow)
  • one full saga diagram with happy path, one failure path, and all compensations
  • one CQRS vs single-model memo defending a real decision
  • one idempotency design for a real consumer, including the dedup store and key
  • one mistake log naming at least 10 errors such as command-shaped event, published before commit, missing dedup, choreography without observability, premature event sourcing

Completion Standard

You have completed Module 3 when all of these are true:

  • you can name an event in past tense without sneaking in a command
  • you can explain why publishing after a DB commit and publishing from an outbox are fundamentally different
  • you can pick queue vs log-based broker with a reason, not a brand preference
  • you can walk a saga through its happy path and at least one failure with correct compensations
  • you can tell the difference between event sourcing and "using events to communicate"
  • you can explain CQRS to someone who thinks it means "two databases"

If you can configure a broker but cannot say what an event is, the module is not complete.


Reading Policy

  • Concept pages are the main path.
  • Local book chunks are selective reinforcement, not a second syllabus.
  • Read only if stuck means try the concept page, self-check, and drill first.
  • Optional deep dive means additional nuance, not required progression.
  • External links to Fowler, microservices.io, and Kafka docs are used surgically where a local chunk is not enough.

Suggested Weekly Flow

DayWork
1Concepts 1-3 and the event-naming drill from Practice 1
2Concepts 4-6 and the outbox sketch from Practice 2
3Concepts 7-9 and the partition-and-consumer-group diagram kata
4Concepts 10-12 and one full saga walkthrough from Practice 3
5Concepts 13-15 and the CQRS-yes-or-no scenarios from Practice 4
6Quiz, interleaved review, and mistake-log cleanup
7Buffer and Feynman note on event-driven thinking

Reference

If you need exact links into the local chunked books, use Reference and Selective Reading.


Rich Learning Pages

Worked Examples | Guided Labs | Case Studies | Mistake Clinic | Reading Guide | Capstone Thread