Monolith-First and Strangler-Fig Migration
What This Concept Is
Two related ideas:
- Monolith-first. The default starting architecture for new systems is a well-modularized monolith. You earn the right to split into services by discovering real seams under production load, not by drawing them on day one.
- Strangler-fig. The migration pattern for moving from a monolith to microservices (or to any new implementation) incrementally: new functionality is added alongside the monolith and routed through a seam, old functionality is gradually re-implemented and switched over, and when the old system has nothing left it is retired. The name is Martin Fowler's, after strangler-fig vines that grow around a host tree.
Together they describe the only migration pattern for long-lived systems that has a high empirical success rate.
Why It Matters Here
The alternative -- "big-bang rewrite" -- has an almost zero success rate at scale. It fails because you cannot stop the business while you rewrite, you cannot maintain feature parity between two moving systems, and you cannot ship confidence until the whole thing lands. Strangler-fig removes all three problems by always shipping an integrated system.
Concrete Example
Retailer with a monolithic order system. The team wants to extract a modern Fulfillment Service.
Step by step:
- Identify the seam. Fulfillment logic lives in
order_processor.py. Function boundaries are messy but the flow "order-confirmed -> reserve-stock -> assign-warehouse -> schedule-shipment" is identifiable. - Put a façade in front. Introduce a
FulfillmentClientabstraction in the monolith. Every call to fulfillment logic now goes through it. Internally, the client still calls the old code. - Build the new service. A new
fulfillment-svcimplements the same operations behind an HTTP or gRPC contract, with its own database. - Dark-launch + shadow. The façade calls both old code (authoritative) and the new service (shadow), comparing results. Fix mismatches.
- Route real traffic in slices. 1% -> 10% -> 50% -> 100% of traffic routed to the new service, with a kill switch.
- Delete the old code. Only when the new path has been authoritative for long enough.
At every step, the whole system ships. There is no "freeze + rewrite + unfreeze" moment.
Common Confusion / Misconception
"Strangler-fig means build the new service first, then cut over." That is the big-bang rewrite again. The distinguishing feature is the always-integrated façade that routes between old and new during the whole migration, not a future cut-over date.
Second confusion: choosing the wrong first service. Teams often pick the most interesting domain (search, recommendations) instead of the seam with the clearest boundary (payments, notifications). The first cut is a capability exercise, not a prestige exercise.
How To Use It
When proposing any migration from monolith to microservices:
- Pick one service with a clean boundary and a low-risk blast radius (often: notifications, search indexing, file storage, reporting).
- Define the seam as an explicit interface inside the monolith first.
- Put a façade/router in front of every caller.
- Build the new service behind the façade.
- Shadow, then percentage-route, then cut over, then delete.
- Only after it is done, pick the next service. Never extract more than one at a time per team.
This is slow. That is a feature, not a bug.
Check Yourself
- What specifically makes strangler-fig migrations succeed where big-bang rewrites fail?
- Why introduce a façade in the monolith before building the new service?
- Why does percentage-routing with a kill switch matter more than the new service being correct?
Mini Drill or Application
Pick a monolithic system (real or the e-commerce example). In 15 minutes:
- Name the first service you would extract and justify it in two sentences (clean seam, low blast radius).
- Sketch the façade interface with 3-5 method signatures.
- List the three routing phases (shadow, 10%, 100%) and one rollback condition for each.
How This Sits In The Module
Cluster 2 tells you where the seams are. Cluster 3 tells you how to break the shared database during the migration. Cluster 4 tells you how to survive the network layer once the new service is live.
Read This Only If Stuck
Local chunks
- FoSA: Modularity and FoSA: Measuring Modularity -- the modular-monolith starting point and the metrics that tell you where a seam exists.
- FoSA: Component-Based Thinking and FoSA: Discovering Components -- components become the extraction units.
- FoSA: Connascence -- seams are where connascence is lowest; if the coupling is algorithmic or positional across a seam, extract only after reducing it first.
- FoSA: Case Study -- Silicon Sandwiches Partitioning -- partitioning a monolith by capability, with trade-offs worked.
- Primer: Asynchronism -- the dark-launch/shadow comparison step usually relies on async fan-out to avoid user-visible latency.
External canonical references
- Martin Fowler, StranglerFigApplication -- original pattern description.
- Martin Fowler, MonolithFirst -- the "monolith-first" default, with counter-examples from Stefan Tilkov.
- Stefan Tilkov, Don't start with a monolith -- the counter-view; read after MonolithFirst to see both sides.
- Sam Newman, The Strangler Fig Application Pattern -- practitioner perspective from the author of Monolith to Microservices.
- AWS Prescriptive Guidance, Strangler fig pattern -- concrete cloud mechanics and tooling.
- Microsoft Learn, Strangler Fig pattern -- same pattern, Azure framing, useful for enterprise audiences.
- Chris Richardson, Refactoring to microservices patterns -- the broader refactoring-to-microservices pattern set, including Strangler.
Depth Path
- Sam Newman, Monolith to Microservices -- a whole book on this, but the pattern page above is enough for 80% of applications. Chapter 3 ("Splitting the monolith") and chapter 4 ("Decomposing the database") are the most-opened sections.
- Martin Fowler, BranchByAbstraction -- the technique underlying the "introduce a façade" step. Works for intra-monolith refactoring too.
Transfer: Strangler-Fig Outside Microservices
The pattern is not specific to microservices. It applies wherever you have a working system, a better one you want to build, and cannot stop the business. Examples:
- Migrating from one database engine to another (the Shopify "podded" migration is a multi-year strangler-fig on the data layer).
- Replacing a legacy UI framework (Angular.js -> Angular, jQuery -> React) by routing routes through a shim.
- Swapping a payments provider: new provider runs in shadow for a deprecation window, traffic shifts by percentage, old provider is retired.
The same invariants hold: always-integrated façade, percentage routing, shadow comparison, kill switch, deletion last. If you master the pattern for one service extraction, you can apply it to databases, UIs, vendors, and language migrations.
Choosing The First Cut
The first extraction is a capability exercise. Score candidate services on these axes:
| Axis | Good first cut | Bad first cut |
|---|---|---|
| Boundary clarity | Obvious seam, few callers | Tangled dependencies across the monolith |
| Blast radius | Non-customer-critical (reporting, notifications, search indexing) | User-facing checkout, payment, auth |
| Data overlap | Self-contained tables | Tables shared with several modules |
| Team readiness | A team has slack to own it | Team already overloaded |
| Observability | Easy to trace and compare old vs new | Hidden in batch jobs or cron |
The worst first cut is usually the most exciting one (recommendations, search, ML scoring) because those have the messiest dependencies on domain data. Pick the boring seam first.
Common Anti-Patterns During Migration
Even with strangler-fig correctly set up, these mistakes re-appear:
- No façade, just call both paths from callers. Callers now contain routing logic; you have 40 routers instead of one, and rollback requires changing every caller.
- Shadow comparison skipped. You route traffic directly at 10% without verifying new-service results match old. First incident follows.
- Percentage routing without a kill switch. When the new service misbehaves, the only rollback is a code change, not a config flip.
- Deleting the old code too early. Before the new path has been authoritative for several weeks across peak traffic, keep the old code alive.
- Extracting too many services at once. Each migration is a meaningful investment. Do one, stabilize, learn, then do the next.
Migration Observability
Tracing (concept 13) is essential during strangler-fig:
- The façade must emit a span labelled
path=oldorpath=newso you can compare latency and error rates. - Alert on divergence: if the shadow comparison shows > X% mismatch, stop routing.
- Log correlation IDs in both paths so customer support can trace a single ticket through either system.
Without these, you will migrate, something will go wrong, and no one will know whether the problem is the new code or a pre-existing bug you never noticed in the old code.
How This Sits In The Module
This concept is the only one in cluster 1 that is a migration pattern (versus a style-selection pattern). Cluster 2 tells you where the seam should be; this concept tells you how to execute one seam safely. Clusters 3-5 are about operating the resulting services -- which matter only if the migration has been done well.