Module 3: Event-Driven Architecture
Primary texts: Fundamentals of Software Architecture (Richards & Ford, Ch. 14) and Designing Event-Driven Systems (Stopford) Selective support: System Design Primer (asynchronism, communication), Microservices Patterns (saga, outbox, CQRS), DDIA stream-processing chapters from Semester 6
This guide is the primary teacher. You do not need to read the source books front-to-back to complete this module. You do need to become fluent at reasoning about a system where events -- not requests -- are the unit of truth and coordination.
Scope of This Module
This module is not "how to run Kafka." It is where communication flips from synchronous requests to asynchronous facts, and where coordination stops being a remote procedure call and starts being a conversation between services about things that already happened.
What it covers in depth:
- events as immutable facts about the past, distinguished from commands and queries
- the mental shift from CRUD tables to event streams as the system of record
- publish-subscribe vs point-to-point queues and when each makes sense
- event notification vs event-carried state transfer and the consistency tradeoff
- the outbox pattern and why database-plus-broker dual writes fail without it
- queue-based brokers (JMS, AMQP, SQS) vs log-based brokers (Kafka, Pulsar)
- consumer groups, partitions, ordering guarantees, and what "exactly once" really buys you
- choreography vs orchestration for multi-service workflows
- sagas with compensating transactions for long-running business processes
- idempotency and deduplication as the consumer-side price of at-least-once delivery
- event sourcing: the append-only log as the canonical record of state
- projections and read models built by folding the log
- CQRS and when separating the write and read sides actually helps
What it deliberately does not try to finish here:
- operating a Kafka cluster in production (SRE-level tuning, multi-region replication)
- every enterprise integration pattern (see Hohpe & Woolf for the full catalog)
- stream-processing frameworks (Flink, Kafka Streams) at the implementation level
- advanced event-modeling workshops (full domain storming)
This is a reasoning module, not a syntax module. If you can publish to Kafka but cannot explain why a command disguised as an event will rot your system, you are not done.
Before You Start
Answer these closed-book before starting the main path:
- What is the difference between "the user clicked Buy" and "charge the user's card"?
- Why does at-least-once delivery force consumers to be idempotent?
- If two services need the same data, what are two fundamentally different ways to keep them in sync?
- Why is writing to a database and then publishing to a broker in the same function a bug waiting to happen?
- What does a saga give you that a 2PC transaction across services does not?
Diagnostic Interpretation
4-5 solid answers
- You are ready for the full path.
2-3 solid answers
- Continue, but expect extra time in Clusters 2 and 4 (outbox and saga).
0-1 solid answers
- Revisit Semester 7 Module 2 (Cluster 3: event-driven topologies) and Semester 6 Module 3 (replication and log-based thinking) before continuing.
What This Module Is For
Event-driven architecture is the communication style that actually scales across team boundaries, not just machines. It shows up any time the real question is:
- multiple systems care about the same thing happening -- who tells them?
- a workflow crosses service boundaries and no one service owns it end-to-end
- the write path and the read path have wildly different performance needs
- you need an audit trail that is not a bolted-on log
- your transactions are longer-running than any single request can hold open
This module builds the reasoning you need for:
- designing microservice boundaries that do not degenerate into distributed monoliths
- choosing a messaging substrate (queue vs log) for the real workload
- writing services that survive duplicates, reordering, and replays
- explaining eventual consistency to product managers without hand-waving
- deciding when event sourcing and CQRS earn their complexity -- and when they do not
You are learning to design for facts, not for calls.
Concept Map
How To Use This Module
Work in order. Later clusters only make sense if the earlier vocabulary is stable.
Cluster 1: Events as a Mental Model
| Order | Concept | Type | Focus |
|---|---|---|---|
| 1 | An Event Is an Immutable Fact About the Past | PRIMARY | The one-sentence definition, past tense, named by what happened |
| 2 | Events vs Commands vs Requests | PRIMARY | Three message intents and why mixing them rots systems |
| 3 | The Shift from CRUD to Events | PRIMARY | Why "last write wins" is a modeling choice, not a law |
Cluster mastery check: Can you rename three "update" operations in a CRUD system as past-tense events without losing information?
Cluster 2: Messaging Patterns
| Order | Concept | Type | Focus |
|---|---|---|---|
| 4 | Publish-Subscribe vs Point-to-Point Queues | PRIMARY | Broadcast vs work distribution; one consumer vs many |
| 5 | Event Notification vs Event-Carried State Transfer | PRIMARY | Thin events that trigger lookups vs fat events that carry payload |
| 6 | The Outbox Pattern: Atomically Publishing Events | PRIMARY | Killing the dual-write bug with a transactional outbox |
Cluster mastery check: Can you pick pub-sub vs queue for a given scenario and defend it, and can you explain why publishing inside a DB transaction is a different failure mode than publishing after commit?
Cluster 3: Brokers and Log-Based Systems
| Order | Concept | Type | Focus |
|---|---|---|---|
| 7 | Queue Semantics: JMS, AMQP, SQS | PRIMARY | Classical brokers, delivery acknowledgements, and DLQs |
| 8 | Log-Based Brokers: Kafka's Design and Retention | PRIMARY | Immutable partitioned log, offsets, retention as a design tool |
| 9 | Consumer Groups, Partitions, Ordering Guarantees | PRIMARY | Per-partition ordering, rebalancing, and key-based routing |
Cluster mastery check: Can you draw a Kafka topic with three partitions and two consumers in a group, and explain exactly which consumer gets which messages and why?
Cluster 4: Distributed Workflow with Events
| Order | Concept | Type | Focus |
|---|---|---|---|
| 10 | Choreography vs Orchestration | PRIMARY | Who drives the workflow: each service, or a central coordinator? |
| 11 | Sagas: Long-Running Transactions Across Services | PRIMARY | Compensating transactions instead of distributed commit |
| 12 | Idempotency, Deduplication, and the Exactly-Once Illusion | PRIMARY | Why "at-least-once + idempotent" is the only honest story |
Cluster mastery check: Can you walk a checkout saga through both a happy path and a failed-payment compensation, and can you explain why the inventory service must be idempotent even if the broker promises "exactly once"?
Cluster 5: Event Sourcing and CQRS
| Order | Concept | Type | Focus |
|---|---|---|---|
| 13 | Event Sourcing: The Event Log Is the System of Record | PRIMARY | State as a fold over an append-only log |
| 14 | Projections and Read Models | PRIMARY | Views derived from the log, rebuildable on demand |
| 15 | CQRS: When to Separate Reads and Writes | SUPPORTING | Splitting the write model from the read model, and when not to |
Cluster mastery check: Can you name three situations where event sourcing is the wrong answer, and one where CQRS without event sourcing is still a valid design?
Then work these practice pages:
| Order | Practice path | Focus |
|---|---|---|
| 1 | Event Modeling Lab | Naming events, avoiding disguised commands, storming a domain |
| 2 | Messaging Patterns Workshop | Pub-sub vs queue, notification vs state transfer, outbox sketch |
| 3 | Saga and Idempotency Clinic | Compensations, retries, dedup keys, ordering |
| 4 | Event-Driven Katas | Repeatable drills for event design, outbox, choreography, CQRS |
Use Module Quiz after the concept and practice path. Use Reference and Selective Reading and Learning Resources only for targeted reinforcement.
Learning Objectives
By the end of this module you should be able to:
- Write an event name and payload that is an immutable past fact and resist the disguised-command trap.
- Choose pub-sub vs point-to-point queueing for a concrete scenario and justify it in one paragraph.
- Explain the dual-write bug and implement the outbox pattern to eliminate it.
- Compare queue-based and log-based brokers by delivery, retention, and replay semantics.
- Draw partitions and consumer groups on a whiteboard and predict who gets which messages.
- Choose choreography or orchestration for a given workflow and defend the tradeoff.
- Design a three-step saga with correct compensating transactions for the failure paths.
- Make a consumer idempotent using a dedup key, and explain why "exactly once" is a systems-level property not a broker feature.
- Decide whether event sourcing, CQRS, or neither is justified for a given bounded context.
- Rebuild a read model from an event log and explain why that rebuildability is the point.
Outputs
- one event catalog with at least 20 events named in past tense, each with payload fields and a producing service
- one architecture sketch comparing queue and log-based brokers for a shared scenario
- one outbox implementation sketch (schema, polling query, idempotency key flow)
- one full saga diagram with happy path, one failure path, and all compensations
- one CQRS vs single-model memo defending a real decision
- one idempotency design for a real consumer, including the dedup store and key
- one mistake log naming at least 10 errors such as
command-shaped event,published before commit,missing dedup,choreography without observability,premature event sourcing
Completion Standard
You have completed Module 3 when all of these are true:
- you can name an event in past tense without sneaking in a command
- you can explain why publishing after a DB commit and publishing from an outbox are fundamentally different
- you can pick queue vs log-based broker with a reason, not a brand preference
- you can walk a saga through its happy path and at least one failure with correct compensations
- you can tell the difference between event sourcing and "using events to communicate"
- you can explain CQRS to someone who thinks it means "two databases"
If you can configure a broker but cannot say what an event is, the module is not complete.
Reading Policy
- Concept pages are the main path.
- Local book chunks are selective reinforcement, not a second syllabus.
Read only if stuckmeans try the concept page, self-check, and drill first.Optional deep divemeans additional nuance, not required progression.- External links to Fowler, microservices.io, and Kafka docs are used surgically where a local chunk is not enough.
Suggested Weekly Flow
| Day | Work |
|---|---|
| 1 | Concepts 1-3 and the event-naming drill from Practice 1 |
| 2 | Concepts 4-6 and the outbox sketch from Practice 2 |
| 3 | Concepts 7-9 and the partition-and-consumer-group diagram kata |
| 4 | Concepts 10-12 and one full saga walkthrough from Practice 3 |
| 5 | Concepts 13-15 and the CQRS-yes-or-no scenarios from Practice 4 |
| 6 | Quiz, interleaved review, and mistake-log cleanup |
| 7 | Buffer and Feynman note on event-driven thinking |
Reference
If you need exact links into the local chunked books, use Reference and Selective Reading.
Rich Learning Pages
Worked Examples | Guided Labs | Case Studies | Mistake Clinic | Reading Guide | Capstone Thread