Skip to main content

Monolith-First and Strangler-Fig Migration

What This Concept Is

Two related ideas:

  • Monolith-first. The default starting architecture for new systems is a well-modularized monolith. You earn the right to split into services by discovering real seams under production load, not by drawing them on day one.
  • Strangler-fig. The migration pattern for moving from a monolith to microservices (or to any new implementation) incrementally: new functionality is added alongside the monolith and routed through a seam, old functionality is gradually re-implemented and switched over, and when the old system has nothing left it is retired. The name is Martin Fowler's, after strangler-fig vines that grow around a host tree.

Together they describe the only migration pattern for long-lived systems that has a high empirical success rate.

Why It Matters Here

The alternative -- "big-bang rewrite" -- has an almost zero success rate at scale. It fails because you cannot stop the business while you rewrite, you cannot maintain feature parity between two moving systems, and you cannot ship confidence until the whole thing lands. Strangler-fig removes all three problems by always shipping an integrated system.

Concrete Example

Retailer with a monolithic order system. The team wants to extract a modern Fulfillment Service.

Step by step:

  1. Identify the seam. Fulfillment logic lives in order_processor.py. Function boundaries are messy but the flow "order-confirmed -> reserve-stock -> assign-warehouse -> schedule-shipment" is identifiable.
  2. Put a façade in front. Introduce a FulfillmentClient abstraction in the monolith. Every call to fulfillment logic now goes through it. Internally, the client still calls the old code.
  3. Build the new service. A new fulfillment-svc implements the same operations behind an HTTP or gRPC contract, with its own database.
  4. Dark-launch + shadow. The façade calls both old code (authoritative) and the new service (shadow), comparing results. Fix mismatches.
  5. Route real traffic in slices. 1% -> 10% -> 50% -> 100% of traffic routed to the new service, with a kill switch.
  6. Delete the old code. Only when the new path has been authoritative for long enough.

At every step, the whole system ships. There is no "freeze + rewrite + unfreeze" moment.

Common Confusion / Misconception

"Strangler-fig means build the new service first, then cut over." That is the big-bang rewrite again. The distinguishing feature is the always-integrated façade that routes between old and new during the whole migration, not a future cut-over date.

Second confusion: choosing the wrong first service. Teams often pick the most interesting domain (search, recommendations) instead of the seam with the clearest boundary (payments, notifications). The first cut is a capability exercise, not a prestige exercise.

How To Use It

When proposing any migration from monolith to microservices:

  1. Pick one service with a clean boundary and a low-risk blast radius (often: notifications, search indexing, file storage, reporting).
  2. Define the seam as an explicit interface inside the monolith first.
  3. Put a façade/router in front of every caller.
  4. Build the new service behind the façade.
  5. Shadow, then percentage-route, then cut over, then delete.
  6. Only after it is done, pick the next service. Never extract more than one at a time per team.

This is slow. That is a feature, not a bug.

Check Yourself

  1. What specifically makes strangler-fig migrations succeed where big-bang rewrites fail?
  2. Why introduce a façade in the monolith before building the new service?
  3. Why does percentage-routing with a kill switch matter more than the new service being correct?

Mini Drill or Application

Pick a monolithic system (real or the e-commerce example). In 15 minutes:

  • Name the first service you would extract and justify it in two sentences (clean seam, low blast radius).
  • Sketch the façade interface with 3-5 method signatures.
  • List the three routing phases (shadow, 10%, 100%) and one rollback condition for each.

How This Sits In The Module

Cluster 2 tells you where the seams are. Cluster 3 tells you how to break the shared database during the migration. Cluster 4 tells you how to survive the network layer once the new service is live.

Read This Only If Stuck

Local chunks

External canonical references

Depth Path

  • Sam Newman, Monolith to Microservices -- a whole book on this, but the pattern page above is enough for 80% of applications. Chapter 3 ("Splitting the monolith") and chapter 4 ("Decomposing the database") are the most-opened sections.
  • Martin Fowler, BranchByAbstraction -- the technique underlying the "introduce a façade" step. Works for intra-monolith refactoring too.

Transfer: Strangler-Fig Outside Microservices

The pattern is not specific to microservices. It applies wherever you have a working system, a better one you want to build, and cannot stop the business. Examples:

  • Migrating from one database engine to another (the Shopify "podded" migration is a multi-year strangler-fig on the data layer).
  • Replacing a legacy UI framework (Angular.js -> Angular, jQuery -> React) by routing routes through a shim.
  • Swapping a payments provider: new provider runs in shadow for a deprecation window, traffic shifts by percentage, old provider is retired.

The same invariants hold: always-integrated façade, percentage routing, shadow comparison, kill switch, deletion last. If you master the pattern for one service extraction, you can apply it to databases, UIs, vendors, and language migrations.

Choosing The First Cut

The first extraction is a capability exercise. Score candidate services on these axes:

AxisGood first cutBad first cut
Boundary clarityObvious seam, few callersTangled dependencies across the monolith
Blast radiusNon-customer-critical (reporting, notifications, search indexing)User-facing checkout, payment, auth
Data overlapSelf-contained tablesTables shared with several modules
Team readinessA team has slack to own itTeam already overloaded
ObservabilityEasy to trace and compare old vs newHidden in batch jobs or cron

The worst first cut is usually the most exciting one (recommendations, search, ML scoring) because those have the messiest dependencies on domain data. Pick the boring seam first.

Common Anti-Patterns During Migration

Even with strangler-fig correctly set up, these mistakes re-appear:

  • No façade, just call both paths from callers. Callers now contain routing logic; you have 40 routers instead of one, and rollback requires changing every caller.
  • Shadow comparison skipped. You route traffic directly at 10% without verifying new-service results match old. First incident follows.
  • Percentage routing without a kill switch. When the new service misbehaves, the only rollback is a code change, not a config flip.
  • Deleting the old code too early. Before the new path has been authoritative for several weeks across peak traffic, keep the old code alive.
  • Extracting too many services at once. Each migration is a meaningful investment. Do one, stabilize, learn, then do the next.

Migration Observability

Tracing (concept 13) is essential during strangler-fig:

  • The façade must emit a span labelled path=old or path=new so you can compare latency and error rates.
  • Alert on divergence: if the shadow comparison shows > X% mismatch, stop routing.
  • Log correlation IDs in both paths so customer support can trace a single ticket through either system.

Without these, you will migrate, something will go wrong, and no one will know whether the problem is the new code or a pre-existing bug you never noticed in the old code.

How This Sits In The Module

This concept is the only one in cluster 1 that is a migration pattern (versus a style-selection pattern). Cluster 2 tells you where the seam should be; this concept tells you how to execute one seam safely. Clusters 3-5 are about operating the resulting services -- which matter only if the migration has been done well.