Regression-First Fixing: Test That Fails, Then Code That Passes

What This Concept Is

Regression-first fixing is a simple rule for any confirmed defect:

Reproduce the bug with a new automated test.
Confirm the test fails for the same reason as the bug report.
Write the smallest change that makes the test pass.
Keep the test in the suite forever as a regression guard.

It is a discipline, not a style. It applies to unit, integration, and E2E-level bugs equally. The key is that the test is written first, it fails first, and the failure mode matches the bug report.

Michael Feathers calls this the characterization test pattern when working with legacy code: write a test that captures current behavior, then change. For a defect, the logic inverts -- the test captures the correct behavior the system should have had -- but the discipline is identical. Dan North's BDD framing adds the second layer: name the test as a sentence about behavior ("POST /tasks accepts unicode titles"), not about internals, so the test reads as specification to the next maintainer.

Why It Matters Here (In the Capstone)

Most capstone defect regressions happen for one of three reasons:

the fix was done without a test, so it regressed later when unrelated code changed;
a test was added after the fix and could no longer confirm the failure was real;
the "bug" was fixed by changing the code in a way that does not match the actual root cause.

Regression-first fixing removes all three failure modes. The test that fails first proves the bug is real. The test that passes second proves the fix addresses the bug. The test that stays in the suite proves the fix holds.

It also couples bug triage (Concept 10) to actual code change: a triage entry without a regression test is incomplete, and a regression test without a triage entry is suspicious.

Concrete Example(s) -- from a real capstone

From BUG-2026-04-04-01 in Concept 10: POST /tasks rejects valid unicode titles.

Step 1: Write the failing test first:

def test_post_tasks_accepts_unicode_title(client):
    response = client.post("/tasks", json={"title": "ship rocket"})
    assert response.status_code == 201
    assert response.json()["title"] == "ship rocket"

Run the test. Confirm it fails, and confirm the failure matches the bug report (422 Unprocessable Entity because the regex rejected the emoji).

Step 2: Make the smallest change that makes the test pass. Replace:

title: str = Field(regex=r"^[\w\s\-]+$")

with a unicode-aware regex:

title: str = Field(min_length=1, max_length=200)

Step 3: Run the full test suite. Confirm the new test passes and no other test regressed.

Step 4: Commit the fix with the test in the same commit. Reference the triage id in the commit message. The test stays in the suite forever.

Later, when someone refactors input validation, the test is still there. If the regex ever silently comes back, the test fails and the bug is prevented from reappearing.

Common Confusion / Misconceptions

The first misconception is that regression-first is "just TDD for bugs." It is narrower than TDD. TDD drives design from tests. Regression-first only asks that you capture the defect as a test before you fix it. It does not require you to write a test before every new line of code.

The second is "I will add the test after I fix it, it is faster." Faster in the minute, slower in the month. Without the failing-first step you cannot be sure the test asserts the real failure. A test written after the fix may pass even before the fix, which makes it useless as a regression guard.

The third is writing the test too wide. If the bug is a regex issue, write the test that specifically asserts unicode input works, not a giant end-to-end flow. Narrow tests are fast and their failure is clear.

The fourth is separating the test commit and the fix commit in history. When bisecting, the test-only commit fails CI, breaking git bisect -- keep them together.

How To Use It (In Your Capstone)

For every confirmed defect:

Read the triage entry (Concept 10).
Pick the cheapest test level that reproduces the bug. Prefer unit, then integration, then E2E.
Write the test, run it, and confirm it fails with the expected failure mode.
Fix the code. Confirm the test passes. Confirm the whole suite is green.
Commit the test and the fix together with a link to the triage entry.
Name the test in behavior-sentence form (Dan North BDD style) so the regression is readable as a specification.
Close the triage entry only when the test is green and the fix has been deployed to staging.

If you cannot reproduce the bug with a test, either the bug is not real or your test environment is missing something. Finding that out is itself a valuable result.

Choosing the Right Test Level

Not every bug belongs at the unit level. A quick decision guide:

Bug shape	Preferred test level
Input validation, pure logic error	Unit
SQL or ORM bug, wrong migration	Integration (real DB)
Adapter for an external API returns wrong shape	Integration against fake
Authorization or tenant-isolation error	Integration at request level
UI sequence wrong (button state, focus, etc.)	E2E
"Works locally, fails in production"	Integration or E2E in staging-equivalent env
Concurrency / race	Unit with seam or integration with controlled scheduler

The rule: reproduce at the cheapest level that reliably captures the bug.

Anti-Patterns to Recognize

Fix-first-then-test. Without a failing-first step, the test might always have passed.
Test that passes before the fix. Rewrite it until it fails on pre-fix code.
Fat regression test. The test covers five behaviors; when it fails later, no one knows which part regressed.
Shared-scope test that stays broken. The regression test relies on setup that later tests break; make it independently runnable.
Commit separation. Test in one commit, fix in the next.

The Virtuous Loop

Triage entry -> failing test -> fix -> test passes -> commit with triage link -> ledger entry or close.

Each arrow is a checkpoint. The discipline is not about each step individually but about never skipping one, especially when a deadline looms. Under pressure, the cheapest move is always regression-first; rushing produces half-fixed bugs that reappear a week later and cost twice as much.

Check Yourself

What must be true about the new test before you write the fix?
Why is a test added after the fix a poor regression guard?
Why pick the cheapest test level that reproduces the bug?
Why keep the test and fix in the same commit rather than sequential ones?
How does Feathers's characterization test differ from a regression test in purpose, even if the mechanics look similar?

Mini Drill or Application (Capstone-scoped)

Pick one defect from your triage log. In one session, write the failing test, confirm it fails with the expected mode, write the fix, confirm the suite passes, and commit them together.
Link the commit to the triage entry and name the test in behavior-sentence form.
Do this for at least three defects this week and paste the commit URLs into your capstone journal.
For one previously-fixed bug that had no regression test, go back and add a characterization test after the fact; note in the ledger that the retrofit happened.
Run git bisect across a week of commits on a made-up regression to confirm your test commits are individually green.

Source Backbone

Capstone implementation applies earlier code-quality, testing, and refactoring material. These books are the source backbone for that practice.

Software Engineering at Google - testing, review, and engineering-process backbone.
Refactoring - safe change and behavior-preserving improvement.
Good Code, Bad Code - maintainability and code-quality judgment.
Clean Code - readability and function-level craft support.

What This Concept Is​

Why It Matters Here (In the Capstone)​

Concrete Example(s) -- from a real capstone​

Common Confusion / Misconceptions​

How To Use It (In Your Capstone)​

Choosing the Right Test Level​

Anti-Patterns to Recognize​

The Virtuous Loop​

See also (integrative)​

Check Yourself​

Mini Drill or Application (Capstone-scoped)​

Source Backbone​