Database Migrations Alongside Code Releases
What This Concept Is
A migration is a versioned, idempotent change to a database schema or data. Common tools: Flyway, Liquibase, Alembic, Rails / Django / Prisma migrations.
The hard part is not writing migrations -- it is releasing them alongside code that is rolling out progressively, where both old and new application instances read and write the same database at the same time.
The canonical pattern for this is expand/contract (also called "parallel change" or "multi-phase migration"):
- Expand -- add the new schema shape additively. Old code still works.
- Migrate -- backfill data into the new shape. Dual-write if needed.
- Switch -- flip the application to read/write the new shape (behind a flag or in a new release).
- Contract -- remove the old shape once nothing reads it.
Each step is a separately shippable release. Never combine steps into one deploy. The pattern is Martin Fowler's Parallel Change generalized to databases, and its specific DB treatment is his Evolutionary Database Design article -- still the canonical reference after twenty years.
Why It Matters Here
Schema changes are the single most common cause of botched production deploys because:
- they are irreversible by naive rollback (rolling back the code leaves the schema mutated)
- they interact badly with canary/rolling/blue-green (mixed code versions hit the same DB)
- they have no flag system as powerful as code flags -- you cannot easily hide a DROPPED column
- "build-once, promote everywhere" (concept 4) holds for the artifact but not for the database, which is shared state across environments
- locking behavior on large tables can halt writes for minutes on a naive
ALTER
If you ship a code change and a schema change in one deploy, a rollback of code means the rolled-back code hits the new schema, often crashing. Expand/contract avoids this by keeping every intermediate state backward-compatible.
Concrete Example: Renaming a Column
Goal: rename users.email_addr to users.email without downtime.
Release 1 -- Expand.
-- migration: add new column
ALTER TABLE users ADD COLUMN email TEXT;
-- backfill existing rows
UPDATE users SET email = email_addr WHERE email IS NULL;
App code for Release 1:
def save_user(user):
user.email_addr = user.email_input # old column still the source of truth
user.email = user.email_input # dual-write
db.save(user)
def read_user(id):
return db.users.get(id).email_addr # still read from old column
Deploy Release 1. All old and new instances work. Users see no change.
Release 2 -- Switch reads.
def read_user(id):
u = db.users.get(id)
return u.email or u.email_addr # prefer new, fall back to old
Deploy Release 2. Verify. Metrics confirm email is populated for every read.
Release 3 -- Stop writing old.
def save_user(user):
user.email = user.email_input # new column only
db.save(user)
Deploy Release 3. Wait a safe window (hours, days, depending on data retention).
Release 4 -- Contract.
ALTER TABLE users DROP COLUMN email_addr;
Deploy Release 4. Rename email_addr is now complete.
No deploy along this path required downtime. Each intermediate state was reversible by deploying the previous release. Each migration ships as its own commit (small, atomic, reviewed -- the same unit-of-change discipline Pro Git teaches for code) so git bisect and rollback both still work.
Big Tables and Online Migrations
ALTER TABLE users ADD COLUMN email TEXT may seem cheap, but the cost depends on the engine and the table:
- Postgres ≥ 11: adding a nullable column with no default is metadata-only (fast).
- MySQL InnoDB with many older engines: copies the whole table, taking the table offline.
- Adding a NOT NULL column with a default on 100M+ rows: multi-hour lock in naive SQL.
Production-scale migrations use online schema change tools:
- gh-ost (MySQL) -- GitHub's tool. Copies rows to a shadow table in chunks, applies binlog changes, atomic rename at the end.
- pt-online-schema-change (Percona, MySQL) -- older but widely used; triggers-based.
- pg_repack / pgroll (Postgres) -- similar idea for Postgres; pgroll specifically orchestrates expand/contract steps declaratively.
These tools let you do hour-long migrations without holding a write lock. The cost is complexity: every schema change must be audited for online-migration compatibility before you ship it.
Common Confusion / Misconception
"We'll just take a 10-minute maintenance window." Sometimes fine for small systems. Not fine at scale, not fine for SaaS with global users, and it normalizes outages-by-design. Expand/contract is the grown-up version.
"Run the migration as part of the app startup." Tempting and dangerous. Rolling deploys start multiple app instances simultaneously; multiple instances will race to run the migration. Even with a lock, migration blocks startup, which blocks rollout, which blocks rollback. Keep migrations as a separate pipeline step.
"Migrations should be part of the same PR as the code." The code and migration can live in the same PR for review, but they must deploy as separate steps with the code backward-compatible to the pre-migration state. Review convention: "this code works whether or not the migration has run yet."
"We can just revert the migration if it fails." Most destructive migrations (DROP, NOT NULL additions on large tables, type changes) are not reversible in practice. Treat irreversible migrations as needing dry runs on a staging clone, not rollback plans.
"Foreign keys can wait." They can, but know that a long migration job that takes an hour on staging may take a day on production. Big-table operations often need online migration tools.
"Backfills in a single SQL statement are fine." A single-statement UPDATE users SET ... locks the table. Always batch backfills (WHERE id BETWEEN x AND x+10000), sleep between batches, and monitor replica lag if you have replicas.
How To Use It
For every schema change, answer before you write it:
- Is this change backward-compatible with the currently-running code? If not, split into expand/contract steps.
- Is this change forward-compatible with the next release's code? If not, you are coupling code and schema; split again.
- Is this change safe to run online at your table sizes? If not, use an online-schema tool or partition the change.
- Is the migration reversible? If not, is it pre-validated on a staging clone with production-like data?
- Who runs it, and how -- an orchestrator step, a job, a manual
psql? Whichever, it should be the same tool every time, not ad-hoc. - How is the migration versioned and tracked? (Every modern tool uses a
schema_migrationstable or equivalent. Do not roll your own.)
Check Yourself
- Why can't you DROP a column in the same deploy as the code change that stops using it?
- Name the four steps of expand/contract and what makes each reversible.
- What is the risk of running migrations from inside application startup?
- Give one migration you would not attempt with expand/contract, and why.
- When does
ALTER TABLE … ADD COLUMNneed an online-schema-change tool instead of a plain SQL migration?
Mini Drill or Application
Take a real schema change someone shipped recently. Rewrite it as expand/contract steps. For each step, write:
- the SQL
- the code-side change needed
- what production failure mode is prevented by doing it separately
- the rollback path at that step
This is the exercise that teaches the pattern. Do it for at least three different change types: add-column, rename, and type-change.
Read This Only If Stuck
- Pro Git: Viewing staged / unstaged changes and committing -- each migration is its own atomic commit
- Pro Git: Topic branches -- short-lived discipline -- migrations belong on short-lived branches, same as code
See also (external)
- Martin Fowler -- Evolutionary Database Design -- foundational article on schema migration discipline
- Martin Fowler -- Parallel Change -- the pattern that expand/contract is a form of
- Flyway documentation -- canonical migration tool for JVM / SQL
- Liquibase -- concepts -- database change management
- gh-ost -- online schema migration for MySQL -- GitHub's tool for big-table migrations without locking
- pgroll -- zero-downtime Postgres migrations -- Postgres equivalent with declarative expand/contract
- Postgres wiki -- Lock management for schema changes -- diagnosing lock conflicts during migrations