Choreography vs Orchestration
What This Concept Is
When a business process crosses multiple services, someone has to decide the order of steps, handle retries, and react to failures. There are two fundamental answers:
Choreography (broker topology)
No central coordinator. Each service subscribes to the events it cares about, does its part, and publishes further events. The workflow is implicit in the web of subscriptions.
Customer clicks Buy
|
v
[checkout]
|
v OrderPlaced
(broker)
/ \ \
v v v
[billing] [inventory] [notify]
| |
v v StockReserved
PaymentCaptured (broker)
(broker) |
\ v
\ [fulfillment]
\ |
\ v
-- OrderShipped --
No single box "owns" the workflow. Each service knows what events it consumes and what events it emits.
Orchestration (mediator topology)
A central orchestrator (AWS Step Functions, Temporal, Camunda, Netflix Conductor, a hand-rolled saga coordinator) drives the workflow. It sends commands to services and waits for replies.
+----------------------+
| Orchestrator |
| (Temporal workflow) |
+-+--+--+-+--+---------+
| | |
v v v
[bill][inv][ship][notify]
The orchestrator knows the state machine and keeps track of where the workflow is.
Why It Matters Here
These are the two ways events, sagas, and multi-service processes combine. Every multi-service workflow you design will be one, the other, or (often) a hybrid. The cost of picking wrong is paid slowly, in debugging and deploys.
The Tradeoff Table
| Axis | Choreography | Orchestration |
|---|---|---|
| Coupling | Each service coupled to events it emits/consumes | Each service coupled to the orchestrator's commands |
| Visibility | Poor -- workflow exists only as a graph of subscriptions | Good -- one place shows the state machine |
| Adding a new step | Add a subscriber to existing events | Change the orchestrator definition |
| Adding a new consumer who only observes | Trivial (subscribe) | Irrelevant to orchestrator |
| Failure handling | Each service handles its own; correlation is hard | Orchestrator sees failures and runs compensations |
| Retries | Per service; each invents its own policy | Centralized, declarative |
| Timeouts / long delays | Awkward; requires scheduling events | First-class (Temporal timers, Step Functions waits) |
| Debugging | "Why didn't X happen?" requires correlating logs | One execution-history view per run |
| Operational dependency | Broker only | Broker + orchestrator (SPOF-ish) |
| Best fit | Reactive integration, fan-out, small workflows | Business-critical workflows, long durations, explicit state |
Concrete Examples
Choreography is right: domain-event fan-out
OrderPlaced is published. billing, inventory, notifications, and analytics each consume it independently. There is no "workflow"; there are four independent reactions. Putting an orchestrator in the middle would add cost for no coordination benefit.
Orchestration is right: onboarding a new customer
Steps: create CRM record -> provision SaaS tenant -> send welcome email -> schedule kickoff call -> report to analytics. Some steps take days (wait for kickoff call to complete). Failures require specific compensations (cancel provisioning). The workflow is a first-class thing that business people care about; they need to see its state.
Implementation with Temporal or Step Functions makes the state machine explicit, lets support reps see "stuck at step 3 waiting on provisioning," and handles retries, timers, and compensations declaratively.
Hybrid is very common
Choreography at the edges, orchestration at the critical core. Orchestrators publish events when major stages complete, so observers can subscribe without becoming participants.
Common Confusion / Misconception
"Choreography is more loosely coupled." It trades explicit coupling (orchestrator knows services) for implicit coupling (services know each other through events). Implicit coupling is not less -- it is less visible. Hidden coupling is often worse than explicit coupling because nobody maintains it.
"Temporal / Step Functions removes the need for events." No. Orchestrated workflows still emit events (completion events, progress events) for observability and reactive integration. The orchestrator handles coordination; events still carry facts.
"Choreography scales better." Sometimes. A broker fan-out can easily scale; but when workflows get complex, choreography's debugging cost grows super-linearly with participants, while orchestration's grows linearly.
"An orchestrator is a single point of failure." Modern orchestrators (Temporal, Step Functions) are themselves distributed and durable. The orchestrator's state is not fragile; what is fragile is the coupling of workflows to a specific orchestrator product, which is real but manageable.
"We can start with choreography and switch to orchestration later if we need to." Sometimes, but the switch is nontrivial: event schemas become orchestrator commands, subscribers become step handlers, and your whole "who owns the workflow" story inverts.
How To Use It
Decision guide:
If you choose choreography:
- Publish rich correlation IDs on every event and stitch traces.
- Invest in a "workflow viewer" dashboard that reads the event stream and shows "orders stuck at step X."
- Write down the implicit state machine in a document; do not leave it as tribal knowledge.
If you choose orchestration:
- Pick a substrate (Temporal, Step Functions, Camunda 8, Cadence) and commit.
- Keep service APIs idempotent; the orchestrator will retry.
- Emit events on stage completions so observers can subscribe without participating.
- Version the workflow definition; long-running instances outlive deploys.
Check Yourself
- In one sentence, what is the coupling that choreography hides and orchestration makes explicit?
- Name two situations where orchestration is clearly better, and two where choreography is clearly better.
- Why is "add a new observer" trivial in choreography and uninteresting in orchestration?
Mini Drill or Application
Take a workflow you know (order checkout, ride dispatch, loan approval, support ticket escalation). In 25 minutes:
- Sketch it as choreography (services, events, subscribers).
- Sketch it as orchestration (orchestrator, steps, commands).
- Name two things the choreography version hides from a support rep.
- Name two things the orchestration version makes expensive to change.
- Pick one for a real team and defend it in three sentences.
Transfer to Adjacent Domains
- Sagas (Concept 11). The choreography/orchestration split is the saga variant split. This concept gives you the tradeoff table; Concept 11 gives you the failure-handling mechanics. You'll always use them together.
- Business process / BPM heritage. Orchestration in microservices is the modern descendant of BPM (Camunda, jBPM). If your org has a BPM practice, Camunda-8 / Zeebe is the natural bridge; teams without BPM DNA usually prefer Temporal.
- Observability (S8M5). Choreographed workflows demand excellent distributed tracing -- the only way to see the whole graph is to stitch spans by correlation ID. Orchestrated workflows have a built-in "tracing surface" in the orchestrator's history view.
- Team topologies (S7M4). Choreography favors autonomous teams with minimal coordination; orchestration favors a designated workflow-owning team. Choosing the wrong shape for your org produces predictable friction (autonomous teams hating a central orchestrator team, or a "coordination" team that isn't actually empowered to own the workflow).
- Serverless step-runner stacks. Step Functions, Workflows (GCP), Durable Functions (Azure) -- these are managed orchestrators with the same semantics as Temporal but different pricing/vendor lock-in curves. The concept transfers; the service choice is secondary.
Read This Only If Stuck
- Richards & Ford: Event-Driven Architecture Style -- broker topology = choreography
- Richards & Ford: Mediator Topology -- mediator topology = orchestration, with detailed tradeoffs
- Richards & Ford: Request-Reply over Events -- the sync-over-async pattern orchestrators use with services
- System Design Primer: Availability patterns -- framing the "orchestrator as SPOF" claim with actual availability math
- Microservices.io: Saga pattern -- both choreography and orchestration variants of sagas
- AWS: Saga orchestration pattern -- prescriptive guidance for orchestration on AWS
- AWS: Building a serverless distributed application using a saga orchestration pattern -- concrete Step Functions implementation
- Temporal: Saga pattern explained -- orchestrator vendor's framing with runnable examples
- Temporal: Compensating actions part of a complete breakfast with sagas -- practical orchestration with compensation
- Temporal documentation -- reference for code-first orchestration
- Bernd Rücker: Why service collaboration needs choreography AND orchestration -- hybrid arguments from Camunda's co-founder