Coverage as a Floor, Not a Target
What This Concept Is
Code coverage is the percentage of lines or branches in your code that are executed by your test suite. It is a useful signal with a narrow purpose:
- as a floor, it is a regression catcher: "do not let coverage of this file drop below X";
- as a target, it is a vanity metric that encourages bad tests written for the number.
The distinction is the entire lesson, and it is a textbook example of Goodhart's law: "when a measure becomes a target, it ceases to be a good measure." Coverage measures what was touched by tests, not whether the tests asserted anything useful. A test that runs a function and never asserts its output still raises coverage. A test that asserts nothing important still counts as covering a branch.
The correct operational use: pick a reasonable per-project floor (for example 75-80% line coverage, lower on UI or orchestration code), enforce it in CI, and never treat it as the goal of testing. Coverage is a negative indicator -- low coverage tells you something is wrong, but high coverage tells you nothing definitive. Mutation testing (discussed below) is the complementary positive indicator.
Why It Matters Here (In the Capstone)
Capstone projects are especially vulnerable to coverage theater: write lots of trivial tests to push coverage to 95%, feel productive, then discover that real behavior is untested because the tests assert nothing. The time cost is large and the quality benefit is near zero.
This module prescribes coverage as a floor for two reasons:
- it protects you from accidental deletion of test coverage during refactors;
- it gives CI a concrete "do not regress" gate that is objective and cheap.
As a guide-level default: start the capstone with an 80% line-coverage floor on core code, lower (or excluded) on trivial glue code, infrastructure code, or generated code. Raise the floor only when you have a specific reason.
Concrete Example(s) -- from a real capstone
A capstone coverage config in CI:
task-manager/corerequires>=80%line coverage;task-manager/adaptersrequires>=70%(integration-tested at a higher level);task-manager/migrationsis excluded (covered via integration tests);task-manager/clirequires>=60%(small entry point, most behavior lives elsewhere).
A pull request that drops overall coverage below any floor fails CI. The message is not "write more tests to hit 80%"; it is "you deleted coverage somewhere -- explain why."
Contrast with a bad coverage target:
"We require 95% coverage across the repo."
After a few weeks, tests look like:
def test_coverage():
assert get_task(1) or True
Coverage is 95%, quality is lower than it was before. The Codecov blog on mutation testing gives a longer version of the same argument: once a measure becomes a target, behavior optimises the measure.
Common Confusion / Misconceptions
The biggest confusion is equating coverage with correctness. A function with 100% coverage and no meaningful assertions is no safer than one with 0% coverage. Coverage only proves the code ran.
A related confusion is branch coverage versus line coverage. Branch coverage is more useful (did each branch of each conditional execute?) but harder to reach, so most projects settle for line coverage at a reasonable floor.
A third is running coverage locally as a motivational dashboard. It is a CI gate, not a scoreboard. Checking it every five minutes is noise.
A fourth is conflating "the floor" with "the goal." The floor exists to block regressions during refactors; it is not a KPI to optimise. Teams that post coverage numbers in chat channels almost always drift toward writing tests for the number.
How To Use It (In Your Capstone)
Operational rules:
- Pick a floor appropriate to each part of the code, not a single global number.
- Configure CI to fail the build if coverage drops below any floor.
- Exclude obviously low-value code (generated code, schema files, templates).
- Never raise the floor in response to a single commit; raise it deliberately and review consequences.
- Periodically audit "what is the weakest-tested critical file" and fix that, not "what is the least-covered directory."
- Pair coverage with mutation testing weekly (see "Beyond Coverage").
- Review tests during self-review (Concept 14) for assertion density: a test body with no
assertis a red flag regardless of coverage.
What coverage does not replace:
- mutation testing (does your test suite detect injected bugs?);
- review of test quality (do the assertions prove behavior?);
- integration and E2E coverage for real-world scenarios.
What Good Coverage Use Looks Like
A healthy capstone coverage setup has these properties:
- a per-directory (or per-module) floor, not a single global number;
- explicit exclusions for generated code, templates, migrations, and
__main__blocks; - a CI check that blocks merges if any floor is violated;
- a periodic review of which critical files have the weakest coverage -- and fixing those;
- no local coverage "dashboard" open in the dev loop.
Unhealthy signals: coverage percentage posted daily in chat, rapidly rising coverage alongside tests that contain no assert, coverage dropping on every PR with a floor that keeps getting lowered, 100% coverage in a directory that has never had a real bug filed.
Three Bad Tests That Raise Coverage
def test_parses():
parse_input("x=1") # no assertion
def test_handles():
result = service.do_it(payload)
assert result or True # always True
def test_internal_state():
service.do_it(payload)
assert service._cache_size == 1 # asserts implementation
A reviewer who sees any of these during self-review (Concept 14) should delete or rewrite them. The coverage number is not worth the lie.
Beyond Coverage
For capstone purposes, coverage is enough. When a codebase outgrows coverage as a signal, the next tools are:
- Mutation testing (
mutmut,stryker,pitest) -- injects bugs and checks that your tests catch them. - Assertion density -- number of
assertcalls per test file. A quick sanity check that tests do something. - Review-time spot checks -- during self-review, pick one test and ask "what bug would this catch?"
See also (integrative)
- S3 M02 Refactoring Techniques -- coverage as the safety net during structural change
- S3 M05 Applied Design & Code Review -- why reviewers look at assertion density, not just the coverage badge
- S7 M05 ADRs & Reviews -- fitness functions as the architectural analogue of coverage floors
- S8 M05 Technical Leadership & Strategy -- why measures become targets when leaders post them publicly
- S10 M01 Domain Analysis & Architecture Design -- deciding which directories are "core" and deserve a higher floor
External references:
- Martin Fowler: The Practical Test Pyramid -- why coverage is a weak proxy for quality
- Codecov blog: Mutation Testing -- Ensuring Code Coverage Isn't a Vanity Metric -- Goodhart's law applied to coverage
- Rogelio Consejo: Stop Chasing Badges -- How 100% Test Coverage Can Ruin Your Code -- the negative-indicator framing
- Peter Rhys Thomas: Charles Goodhart, Code Coverage and Unintended Consequences -- the law behind the lesson
- Kent Beck: Test Desiderata -- writable/readable/specific as the positive properties coverage never measures
Check Yourself
- Why is 95% coverage dangerous if treated as a target?
- What does coverage not tell you about your test suite?
- When should you raise a coverage floor, and when should you leave it alone?
- Why is mutation testing a more informative signal than coverage?
- How does Goodhart's law apply specifically to a capstone team that posts coverage in chat?
Mini Drill or Application (Capstone-scoped)
- Configure coverage measurement on CI with a per-directory floor appropriate to each part of your codebase.
- Exclude at least one directory with written justification (migrations, generated code,
__main__). - Intentionally delete one small test locally, confirm CI blocks the merge, then restore the test and write one paragraph in your journal on how floors protect you from accidental regression.
- Run
mutmut(or your language's mutation tester) on the core module. Write up the first three mutants your tests missed and decide which to fix. - Audit your existing tests for the three bad patterns in this concept (no assertion,
or True, internal-state assertion). Delete or rewrite the ones you find.
Source Backbone
Capstone implementation applies earlier code-quality, testing, and refactoring material. These books are the source backbone for that practice.
- Software Engineering at Google - testing, review, and engineering-process backbone.
- Refactoring - safe change and behavior-preserving improvement.
- Good Code, Bad Code - maintainability and code-quality judgment.
- Clean Code - readability and function-level craft support.