Coverage as a Floor, Not a Target

What This Concept Is

Code coverage is the percentage of lines or branches in your code that are executed by your test suite. It is a useful signal with a narrow purpose:

as a floor, it is a regression catcher: "do not let coverage of this file drop below X";
as a target, it is a vanity metric that encourages bad tests written for the number.

The distinction is the entire lesson, and it is a textbook example of Goodhart's law: "when a measure becomes a target, it ceases to be a good measure." Coverage measures what was touched by tests, not whether the tests asserted anything useful. A test that runs a function and never asserts its output still raises coverage. A test that asserts nothing important still counts as covering a branch.

The correct operational use: pick a reasonable per-project floor (for example 75-80% line coverage, lower on UI or orchestration code), enforce it in CI, and never treat it as the goal of testing. Coverage is a negative indicator -- low coverage tells you something is wrong, but high coverage tells you nothing definitive. Mutation testing (discussed below) is the complementary positive indicator.

Why It Matters Here (In the Capstone)

Capstone projects are especially vulnerable to coverage theater: write lots of trivial tests to push coverage to 95%, feel productive, then discover that real behavior is untested because the tests assert nothing. The time cost is large and the quality benefit is near zero.

This module prescribes coverage as a floor for two reasons:

it protects you from accidental deletion of test coverage during refactors;
it gives CI a concrete "do not regress" gate that is objective and cheap.

As a guide-level default: start the capstone with an 80% line-coverage floor on core code, lower (or excluded) on trivial glue code, infrastructure code, or generated code. Raise the floor only when you have a specific reason.

Concrete Example(s) -- from a real capstone

A capstone coverage config in CI:

task-manager/core requires >=80% line coverage;
task-manager/adapters requires >=70% (integration-tested at a higher level);
task-manager/migrations is excluded (covered via integration tests);
task-manager/cli requires >=60% (small entry point, most behavior lives elsewhere).

A pull request that drops overall coverage below any floor fails CI. The message is not "write more tests to hit 80%"; it is "you deleted coverage somewhere -- explain why."

Contrast with a bad coverage target:

"We require 95% coverage across the repo."

After a few weeks, tests look like:

def test_coverage():
    assert get_task(1) or True

Coverage is 95%, quality is lower than it was before. The Codecov blog on mutation testing gives a longer version of the same argument: once a measure becomes a target, behavior optimises the measure.

Common Confusion / Misconceptions

The biggest confusion is equating coverage with correctness. A function with 100% coverage and no meaningful assertions is no safer than one with 0% coverage. Coverage only proves the code ran.

A related confusion is branch coverage versus line coverage. Branch coverage is more useful (did each branch of each conditional execute?) but harder to reach, so most projects settle for line coverage at a reasonable floor.

A third is running coverage locally as a motivational dashboard. It is a CI gate, not a scoreboard. Checking it every five minutes is noise.

A fourth is conflating "the floor" with "the goal." The floor exists to block regressions during refactors; it is not a KPI to optimise. Teams that post coverage numbers in chat channels almost always drift toward writing tests for the number.

How To Use It (In Your Capstone)

Operational rules:

Pick a floor appropriate to each part of the code, not a single global number.
Configure CI to fail the build if coverage drops below any floor.
Exclude obviously low-value code (generated code, schema files, templates).
Never raise the floor in response to a single commit; raise it deliberately and review consequences.
Periodically audit "what is the weakest-tested critical file" and fix that, not "what is the least-covered directory."
Pair coverage with mutation testing weekly (see "Beyond Coverage").
Review tests during self-review (Concept 14) for assertion density: a test body with no assert is a red flag regardless of coverage.

What coverage does not replace:

mutation testing (does your test suite detect injected bugs?);
review of test quality (do the assertions prove behavior?);
integration and E2E coverage for real-world scenarios.

What Good Coverage Use Looks Like

A healthy capstone coverage setup has these properties:

a per-directory (or per-module) floor, not a single global number;
explicit exclusions for generated code, templates, migrations, and __main__ blocks;
a CI check that blocks merges if any floor is violated;
a periodic review of which critical files have the weakest coverage -- and fixing those;
no local coverage "dashboard" open in the dev loop.

Unhealthy signals: coverage percentage posted daily in chat, rapidly rising coverage alongside tests that contain no assert, coverage dropping on every PR with a floor that keeps getting lowered, 100% coverage in a directory that has never had a real bug filed.

Three Bad Tests That Raise Coverage

def test_parses():
    parse_input("x=1")  # no assertion

def test_handles():
    result = service.do_it(payload)
    assert result or True  # always True

def test_internal_state():
    service.do_it(payload)
    assert service._cache_size == 1  # asserts implementation

A reviewer who sees any of these during self-review (Concept 14) should delete or rewrite them. The coverage number is not worth the lie.

Beyond Coverage

For capstone purposes, coverage is enough. When a codebase outgrows coverage as a signal, the next tools are:

Mutation testing (mutmut, stryker, pitest) -- injects bugs and checks that your tests catch them.
Assertion density -- number of assert calls per test file. A quick sanity check that tests do something.
Review-time spot checks -- during self-review, pick one test and ask "what bug would this catch?"

Check Yourself

Why is 95% coverage dangerous if treated as a target?
What does coverage not tell you about your test suite?
When should you raise a coverage floor, and when should you leave it alone?
Why is mutation testing a more informative signal than coverage?
How does Goodhart's law apply specifically to a capstone team that posts coverage in chat?

Mini Drill or Application (Capstone-scoped)

Configure coverage measurement on CI with a per-directory floor appropriate to each part of your codebase.
Exclude at least one directory with written justification (migrations, generated code, __main__).
Intentionally delete one small test locally, confirm CI blocks the merge, then restore the test and write one paragraph in your journal on how floors protect you from accidental regression.
Run mutmut (or your language's mutation tester) on the core module. Write up the first three mutants your tests missed and decide which to fix.
Audit your existing tests for the three bad patterns in this concept (no assertion, or True, internal-state assertion). Delete or rewrite the ones you find.

Source Backbone

Capstone implementation applies earlier code-quality, testing, and refactoring material. These books are the source backbone for that practice.

Software Engineering at Google - testing, review, and engineering-process backbone.
Refactoring - safe change and behavior-preserving improvement.
Good Code, Bad Code - maintainability and code-quality judgment.
Clean Code - readability and function-level craft support.

What This Concept Is​

Why It Matters Here (In the Capstone)​

Concrete Example(s) -- from a real capstone​

Common Confusion / Misconceptions​

How To Use It (In Your Capstone)​

What Good Coverage Use Looks Like​

Three Bad Tests That Raise Coverage​

Beyond Coverage​

See also (integrative)​

Check Yourself​

Mini Drill or Application (Capstone-scoped)​

Source Backbone​