The Goal: Small, Frequent, Reversible Changes
What This Concept Is
The entire point of CI/CD is not "automation" and not "speed" in the abstract. It is a single stance on how to ship code:
- changes should be small -- one reviewable intent per commit or PR
- they should be frequent -- integrated into the main line multiple times a day
- they should be reversible -- rolling back is a one-command, seconds-to-minutes operation
Every other practice in this module is downstream of that stance. Pipelines, canaries, feature flags, signed artifacts, and DORA metrics all exist to protect small, frequent, reversible changes. The stance is older than "DevOps" -- Kent Beck's Extreme Programming (1999) already insisted on continuous integration and the smallest useful increment; Humble and Farley's Continuous Delivery (2010) generalized it to the deployment pipeline.
Why It Matters Here
Batch size is the variable most teams do not realize they are choosing. A release that bundles 40 commits is not "40x one commit." It is strictly worse:
- failures are ambiguous -- any of the 40 could be the cause, and bisection is expensive
- rollbacks are expensive -- you lose 39 good changes to fix one bad one
- review quality drops -- reviewers skim because the diff is unreadable
- the blast radius correlates with change size, so infrequent releases carry the highest risk exactly when the team is least practiced at deploying
- mean time to detect a regression grows with batch size, because any given bug gets diluted among dozens of other changes in the same deploy marker
The organizational temptation is to respond to risk by adding approvals, environments, and release windows. All of those grow batch size. They optimize the wrong variable and make things worse. The DORA research (concept 3) has now quantified this across 30,000+ survey respondents: smaller batches correlate with both higher throughput and lower change-fail rate -- the tradeoff people imagine does not exist in the data.
Concrete Example
Two teams shipping the same feature.
Team A (quarterly release): accumulates ~200 commits on a release branch, merges to main once, runs a 3-day regression test, deploys on a Saturday. A bug slips through. Rollback means reverting ~200 commits; the hotfix takes a week because the release branch has already diverged.
Team B (trunk-based, continuous): every merged PR is a candidate release. Code flows to production behind a feature flag. The feature is "launched" by flipping the flag at 1% of users, then 10%, then 100%. A bug appears at 10%. The flag goes off in 30 seconds. No rollback, no rebuild, no war room.
Team B ships the same feature with dramatically less risk -- because the changes are small, frequent, and reversible. The git history also reads cleanly: each commit is one intent, the way Pro Git describes a well-formed commit -- small, reviewable, with a tight message explaining why, not what.
What "Small" Actually Means
Useful rules of thumb for the PR / commit unit:
| Dimension | Healthy | Warning |
|---|---|---|
| Lines changed in a PR | 50-400 | > 1000 |
| Files touched | 1-10 | > 25 |
| Commits per PR | 1-6 (logically separable) | > 20 |
| Age from first commit to merge | < 2 days | > 1 week |
| Reviewers asking for structural changes | rare, early | common, late |
These numbers are not laws; they are signals. A 2000-line PR that is a machine-generated schema regeneration is not risky in the same way as a 2000-line PR that is hand-written logic. The discipline is to notice when your changes are crossing these thresholds and to ask why before accepting the cost.
Common Confusion / Misconception
"Small changes means more risk because there are more deploys." No. The probability of a failed deploy scales with change size, not deploy count. More deploys of smaller changes produce fewer incidents and faster recovery. This is one of the main findings in the DORA research (see concept 3) -- the "throughput vs stability" tradeoff is a myth at the team scale.
"Our domain is too regulated for continuous delivery." Regulation requires audit trails, separation of duties, and approvals -- all of which a pipeline enforces better than manual releases. Heavily regulated organizations (banks, healthcare, payments, defense) are among the heaviest users of continuous delivery. Capital One, Monzo, and Siemens Healthineers are all public case studies.
"We do CI already -- we have a pipeline." Having a pipeline is not CI. CI is the practice of integrating small changes into trunk frequently. If your team has 5-day-old feature branches sitting in review, you have a build server, not continuous integration. Martin Fowler is explicit on this definitional point: integrating once a week is not CI, whatever the tooling.
"Reversible means the code can be reverted." Deeper: a change is reversible if you can return the user-visible system to its previous behavior in bounded wall-clock time -- artifact rollback, feature-flag flip, DB expand/contract discipline (concept 12), or a canary traffic shift. Reversibility is a property of the whole release path, not just the git operation.
How To Use It
Use batch size as the leading indicator of delivery health:
- Measure it. Count lines, files, or commits per merged PR this month.
- Aim small. A healthy target is dozens of lines, not thousands. Split larger changes behind feature flags if needed.
- Integrate often. If a branch is older than two days without a merge, it is drifting and accruing conflict debt. Pro Git's topic-branch guidance is explicit: topic branches are meant to be short.
- Make reversal cheap. Rollback should not require a human writing SQL. If it does, that is the real bug.
- Tie every metric you care about (lead time, failure rate) back to batch size first before reaching for any other intervention.
The loop is self-reinforcing. Once it spins up, teams ship faster and safer.
Check Yourself
- Why is a release that bundles 40 commits strictly worse than 40 releases of one commit each?
- What part of a team's delivery process grows batch size when risk goes up? Why is that the wrong reflex?
- If rolling back requires a human to run a database script, what does that tell you about the shape of the change?
- Give one example of a change that looks big (many lines) but is safe, and one that looks small (few lines) but is risky. What is the actual risk signal?
Mini Drill or Application
Pick a recent release from a project you know (yours or an open-source one). Answer:
- How many commits were in that release?
- If a bug were discovered 10 minutes after deploy, what would the rollback actually require?
- Could the feature have shipped behind a flag at 0% and been turned on in a separate change?
- What is the single structural change that would have made that release half the size?
- Where on the "healthy vs warning" table above does your last merged PR sit?
Write 5-8 sentences. The output is the observation, not the fix.
Read This Only If Stuck
- Pro Git: Committing your changes -- atomic units
- Pro Git: Topic branches -- the short-lived discipline
- Pro Git: Distributed Git workflows -- integration tempo
- Git from the bottom up: A commit by any other name -- commits as the unit of change
See also (external)
- DORA -- capabilities that drive delivery performance -- empirical backing for the small-batch claim
- DORA -- working in small batches -- the specific capability
- Trunk-Based Development -- introduction -- the branching model that operationalizes this stance
- Martin Fowler: ContinuousDelivery -- canonical short definition
- Martin Fowler: ContinuousIntegration -- foundational long-form on the integration discipline