Projections and Read Models
What This Concept Is
A projection (sometimes: read model, materialized view) is a derived data structure built by consuming an event stream and folding its events into a query-optimized shape. It exists to serve reads efficiently; it is not the source of truth.
event log ---> projection handler ---> read store (DB, cache, search index)
The three defining properties:
- Derived. The projection's state is a pure function of the events it has consumed.
- Rebuildable. Drop the projection, reset its offset to zero, re-consume. You get the same state. No data loss.
- Independent. You can run many projections over the same events, each tailored to a specific query shape.
Projections exist anywhere events flow -- in event-sourced systems (Concept 13) they are the dominant way to read data, and in CRUD-plus-events systems they are the natural substrate for cross-service read models without 2PC.
Why It Matters Here
Projections are how you pay for the read side after you have committed to the event side:
- Query optimization without schema contortion. The write side can stay normalized; each read side picks its own shape: relational, document, search index, materialized aggregate.
- Decoupling. Each service builds its own projection from the broker; no cross-database joins.
- Evolvability. Change the read shape? Build a new projection, switch readers over, drop the old one.
- Rebuildability. Found a bug in a projection? Reset offset, replay, fixed.
In short: events give you the what happened; projections give you the how we answer queries about it.
Concrete Example: Three Projections Over One Stream
Events on order.* topic: OrderPlaced, PaymentCaptured, OrderShipped, OrderCancelled.
Projection A -- "Customer dashboard" read model
Shape: one row per order with summary fields.
CREATE TABLE customer_order_summary (
order_id TEXT PRIMARY KEY,
customer_id TEXT NOT NULL,
status TEXT NOT NULL,
total_cents INT,
placed_at TIMESTAMPTZ,
shipped_at TIMESTAMPTZ
);
Handler (pseudo):
def on(event):
match event.type:
case "OrderPlaced":
UPSERT into customer_order_summary (order_id, customer_id, status='placed',
total_cents=event.total_cents, placed_at=event.occurred_at)
case "PaymentCaptured":
UPDATE set status='paid' where order_id=event.order_id
case "OrderShipped":
UPDATE set status='shipped', shipped_at=event.occurred_at ...
case "OrderCancelled":
UPDATE set status='cancelled' ...
Reads are trivial (single-row or index-scan). The projection is always lightly behind the live stream.
Projection B -- "Revenue by day" analytics cube
Shape: aggregated by day.
CREATE TABLE revenue_by_day (
day DATE PRIMARY KEY,
orders INT,
revenue BIGINT
);
Handler consumes only PaymentCaptured, increments revenue and orders for the appropriate day. No joins; no scanning the orders DB.
Projection C -- "Search the orders" index
A consumer indexes each order into Elasticsearch, flattened: customer name, SKU list, status, total, tags. A search query never touches the source of truth.
The key observation
All three projections read the same event stream, each folding it into a completely different shape. Dropping and rebuilding any of them is a routine operation: stop the consumer, recreate the table/index, reset the offset to 0, start consuming.
Rebuildability in Practice
The rebuild workflow:
1. create parallel read store (new_customer_order_summary)
2. create a new consumer group "customer_order_summary_v2"
3. reset offsets to zero (or to a historical starting point)
4. consume the stream, building the new table
5. once caught up, switch reader traffic to the new table
6. drop the old table
Two gotchas:
- Your event stream must retain history for as long as you need to rebuild. A Kafka topic with 7-day retention cannot rebuild a 2-year projection; that projection needs a log with longer retention or a dedicated event store.
- During rebuild, the new projection lags. Switch reader traffic only after
lag == 0.
Handling Out-of-Order or Late Events
In a partitioned stream (Concept 09), events for a given aggregate arrive in order within a partition. But across aggregates, a projection can see events "out of wall-clock order": an older event may arrive later due to rebalance, reprocessing, or a slower producer.
Design choices:
- Monotonic transitions: write the projection such that older events cannot overwrite newer ones (
UPDATE ... WHERE occurred_at < current). Use event timestamps or version numbers. - Idempotent upsert: the projection re-applies the same event and converges.
- Event-time vs processing-time: for analytics, fold by event time (
day = event.occurred_at::date), not processing time.
Common Confusion / Misconception
"The projection is the source of truth for reads." It is the source of query answers, not of truth. Truth lives in the events. If a projection and the events disagree, the events win and you rebuild.
"Projections give up consistency." They give up strong consistency; they are eventually consistent. For most read paths -- dashboards, search, aggregated analytics -- this is fine. For strict read-your-own-write paths, you serve from the write side or use a synchronous fallback.
"We can have one projection that answers all queries." You can, and it usually becomes a bloated, slow read side. The power of projections is specialization -- one per query shape.
"Projections need to be relational." A projection is any fold of the event stream into a readable shape. Document store, search index, in-memory structure, or a flat file are all fine.
"Every event updates every projection." Each projection subscribes only to the event types it cares about. Ignore the rest.
How To Use It
Designing a projection:
- Pick the query. What single query does this projection answer, and at what latency?
- Pick the store. Relational for joinable views, document for nested data, search for text, columnar for analytics.
- Pick the event subset. Which event types actually change this projection?
- Write the handler. For each event type, the update rule must be idempotent and monotonic.
- Decide lag tolerance. Alert if projection lag exceeds the SLO (e.g., 5 seconds).
- Plan the rebuild. Store + consumer group + ability to reset offsets.
Check Yourself
- Why must every projection be rebuildable, and what does that require of the event stream?
- Name three projections of an
OrderPlaced+PaymentCapturedstream, each with a different store type. - Why is "eventually consistent" the right default for projections, and when is it wrong?
- What property must a projection handler have so that reprocessing the same event does not corrupt state?
Mini Drill or Application
Take a dashboard or read API you know (a user timeline, an inventory summary, a transaction history). In 20 minutes:
- Identify the event stream(s) it could be built from.
- Write the DDL for the read model.
- Write the handler (pseudocode) for each event type.
- Describe the rebuild procedure end-to-end.
- Name one edge case (late event, out-of-order, partial event loss) and how the handler tolerates it.
Transfer to Adjacent Domains
- CQRS (Concept 15). Projections are the structural thing CQRS commits to. Once you can build three projections over one log, CQRS is already implicit; formalizing it is mostly a governance and team-ownership exercise.
- Event sourcing (Concept 13). Rebuildable projections are what make ES practical -- you do not have to get the read shape right up front. "Change the projection, reset offset, re-fold" is the superpower ES gives you.
- Caching (classic distributed systems). A projection is a materialized, eventually-consistent cache fed by events. The cache-update-pattern literature (cache-aside, write-through, write-behind) maps onto projections if you squint -- but projections give up invalidation problems in exchange for rebuild cost.
- Search (Elasticsearch, OpenSearch). The canonical "search index over a service's data" design is just a projection fed by events. Teams that maintain hand-rolled ETL into Elastic usually benefit from rewriting it as a projection -- lower drift, lower latency, trivially rebuildable.
- Analytics warehouse (S6). A projection built into a columnar store (Snowflake, BigQuery) is often the cleanest way to feed a BI dashboard. The same events feed operational dashboards and analytics cubes -- no ETL parallel to the event layer.
Read This Only If Stuck
- Richards & Ford: Event-Driven Architecture Style -- broker fan-out that underlies multi-projection designs
- Richards & Ford: Mediator Topology -- where projections fit in mediator-based workflows
- Richards & Ford: Preventing Data Loss -- durability requirements projections depend on
- System Design Primer: Cache update patterns -- comparison substrate for "projection as derived view"
- System Design Primer: Consistency patterns -- eventual-consistency framing for projection lag
- Martin Fowler: CQRS -- projections framed as the read side of CQRS
- Martin Fowler: Reporting Database -- older term for the same idea; useful bridge for teams coming from data warehousing
- Martin Fowler: Eager Read Derivation -- the "compute reads at write time" principle projections implement
- Confluent: Materialized views with ksqlDB -- practical treatment with Kafka-native tooling
- Confluent: Unifying stream processing and interactive queries -- Kafka Streams' "embedded database" projection model
- Kafka Streams: Stateful operations -- one canonical way to implement projections
- Microservices.io: CQRS pattern -- sibling pattern that relies on projections as first-class artifacts