Characterization Tests: Pin Behavior Before Change

What This Concept Is

A characterization test (Michael Feathers) is a test whose purpose is to capture what the code actually does, not what it should do. You write one before refactoring legacy code whose behavior is not documented.

Flow:

Guess an input the code is likely to handle.
Run it, observe the output.
Write an assertion that locks in that observed output as the expected value.
Repeat for edge inputs and error paths.

The test suite then becomes the baseline. Any change to behavior will flip at least one test from green to red.

Characterization tests often include current bugs. That is deliberate. If you want the bug fixed, fix it in a separate feature-hat commit after the refactor.

Why It Matters Here

Every refactor move in this module assumes "tests stay green after each step." If you have no tests, you cannot refactor. But most real code you inherit has no tests. Characterization tests bridge the gap.

Fowler is explicit: refactoring requires tests. If you want to refactor, you have to write tests.

Concrete Example

Legacy function:

function quoteFor(customerId, quantity) {
  let base = quantity * 10;
  if (customerId.startsWith("VIP")) base *= 0.9;
  if (quantity > 100) base -= 5;
  return Math.round(base);
}

You do not know if the -5 is right. You do not care yet. Pin current behavior:

test('characterization: ordinary customer, small order', () => {
  expect(quoteFor('C001', 3)).toBe(30);
});
test('characterization: VIP, small order', () => {
  expect(quoteFor('VIP42', 5)).toBe(45);
});
test('characterization: VIP, large order', () => {
  expect(quoteFor('VIP42', 200)).toBe(1795);
});
test('characterization: ordinary, large order', () => {
  expect(quoteFor('C009', 150)).toBe(1495);
});

Now you can Extract Function, rename, move -- the four tests flip red the moment you change a number.

Common Confusion / Misconception

"But test 3 encodes a bug!" Yes. Characterization tests freeze the present, not the ideal. When you later fix the bug, you will update that specific test in the feature commit, and the diff will show exactly what behavior changed. That traceability is the point.

Also: these tests are not a long-term test suite. Many will become obsolete once unit tests replace them. They are scaffolding.

How To Use It

Quick rules:

Prefer many small tests over one large one (easier to localize a red).
Capture snapshots for complex outputs; keep human-readable values when possible.
If the code talks to a network or clock, introduce a seam (next concept) before writing the test.

Check Yourself

Why is it acceptable for a characterization test to encode a bug?
You run the code and the output is a 4KB JSON blob. What are two options for asserting it?
What is the minimum number of characterization tests before you can start refactoring, and what determines it?

Mini Drill or Application

Find a 30-line function in a codebase you do not own. Write three characterization tests: happy path, boundary case, error case. Run them green. Now Extract Function on the happy-path branch. Tests must still be green. If they go red, revert -- you changed behavior, not structure.

Video and Lecture References

Primary lecture: Michael Feathers -- Working Effectively with Legacy Code (overview talk) (50 min)
Visual supplement: Fowler -- Self-Testing Code (5-min read)

Article References

Fowler: Legacy Seam - why seams enable characterization tests
Emily Bache: Approval Tests - tooling for "lock in the current output" style

External Exercises

Gilded Rose Refactoring Kata - classic characterization-first exercise
Tennis Refactoring Kata - longer version with more branches

Depth Path

Read This Only If Stuck - Fowler chunks 032-036 (Building Tests chapter)
Optional deep dive: Feathers, Working Effectively with Legacy Code, chapters on "characterization" and "test harness"

Source Backbone

Refactoring is the canonical book backbone for this module. Use these sources after attempting the refactor and tests yourself.

Refactoring (Fowler) - primary source for refactoring discipline and named moves.
Clean Code - support for readability and small-function judgment.
Good Code, Bad Code - support for maintainability tradeoffs.

What This Concept Is​

Why It Matters Here​

Concrete Example​

Common Confusion / Misconception​

How To Use It​

Check Yourself​

Mini Drill or Application​

Video and Lecture References​

Article References​

External Exercises​

Depth Path​

Source Backbone​