Unit Tests: Where They Add Real Leverage

What This Concept Is

A unit test exercises a single piece of logic in isolation, with no real database, network, or file system, and runs in milliseconds. That definition is well known; the harder question is where to spend unit tests on a capstone-sized codebase so the return is real.

Martin Fowler distinguishes solitary unit tests (the unit is tested with every collaborator replaced by a test double) from sociable unit tests (the unit is tested together with its real collaborators as long as those collaborators are cheap and in-process). For a capstone, the useful default is sociable by default, solitary by exception: test behavior, not classes; let small collaborators participate; reach for doubles only when a collaborator is slow, non-deterministic, or owned by somebody else.

Unit tests have leverage when:

the logic is pure (same inputs, same output, no side effects);
there are many branches, edge cases, or combinations to cover;
the cost of a wrong answer is high enough to justify the test;
the code is likely to change and needs a safety net before refactoring.

They lose leverage when the logic is trivial delegation (return repo.findById(id)), when the interesting behavior is in an integration, or when the test mostly mocks every dependency and asserts that mocks were called.

Why It Matters Here (In the Capstone)

In a capstone, unit tests are cheap and fast, which is why they form the base of the test pyramid (Mike Cohn's pyramid as popularised by Fowler). But cheap does not mean free. A hundred low-value unit tests that assert framework behavior add maintenance load without adding safety. The job of this concept is to teach when to write one.

This module adopts an opinionated rough split for a capstone of typical size: about 70% unit, 25% integration, 5% end-to-end (see Concept 6 for the rationale on E2E). That 70% is where unit tests pay for themselves.

Concrete Example(s) -- from a real capstone

A capstone service computes whether a task is overdue. The logic:

def is_overdue(task, now):
    if task.completed_at is not None:
        return False
    if task.due_date is None:
        return False
    return task.due_date < now

This is a unit-test magnet:

pure function (no I/O);
three branches;
at least four meaningful edge cases: completed, no due date, due in future, due in past, due equal to now.

A good unit test set covers each case with one test. Running the whole set should take under 50 ms. If the function also took a timezone, you would add two more cases -- the branchier the function, the more units earn their keep.

Contrast with code that does not benefit as much from a unit test:

def get_task(task_id):
    return task_repo.find_by_id(task_id)

Writing a unit test here typically means mocking task_repo.find_by_id to return a value and asserting that the function returned that value. That test asserts Python assignment works, not that get_task does its job. An integration test against a real repository is much more useful.

Common Confusion / Misconceptions

The first misconception is equating "unit test" with "test of one class." A unit is a unit of behavior, not a unit of code. A test that hits three small collaborators together to exercise one piece of logic is still a unit test (sociable style), as long as the collaborators are real code (not the database).

The second is confusing unit tests with tests that use mocks. Over-mocking couples the test to the implementation and frequently produces "the test passes, the system fails" outcomes. Mocks are for expensive dependencies you do not own, not for friends of the subject.

The third is the idea that 100% unit coverage is a goal. It is not. Unit tests cover logic that is worth asserting; the rest of the coverage comes from higher levels.

The fourth is treating the unit test as a place to specify how the code works. A unit test asserts behavior, not internal state -- if the only way the test fails is an internal rename, the test is coupled to the implementation and will block legitimate refactors.

How To Use It (In Your Capstone)

When deciding whether to write a unit test, ask:

Is the logic pure, or does its interesting behavior depend on a real dependency?
Are there enough branches or edge cases to make this test set worth the maintenance load?
Will the test still make sense if I refactor the implementation?
Can an integration test at a coarser level cover this with less duplication?
Can I name the test as a sentence about behavior (Dan North / BDD style) rather than about a method?
Is the test deterministic, fast (<20ms), and independent from other tests (Kent Beck's test desiderata)?
Does the test fail for the right reason when I mutate the code it claims to protect?

If the answer is "pure and branchy" -> write the unit test. If the answer is "mostly delegation" -> skip it and let the integration test cover it.

A Decision Table for the Capstone

Code shape	Unit test?	Why
Pure function with 3+ branches	Yes	Branches are where bugs hide; fast tests cover them cheaply.
Domain invariant (e.g. `Task.validate`)	Yes	Invariants must hold regardless of caller; isolated tests encode them.
State machine transitions	Yes	Every transition is a branch; unit tests encode the entire truth table.
Service method that delegates to a repo	No	Integration test covers the real behavior; a unit test here just asserts delegation.
Controller/handler	Rarely	Integration test at request level is usually enough.
Data-access methods	No	Unit tests here mock the DB and miss the actual bugs.
String formatting / pure transformation	Yes	Fast to test, easy to get wrong, easy to regress.
Configuration loading	Sometimes	Yes if logic is non-trivial, no if it is framework-driven.

Rule of thumb: unit test when the interesting behavior is in the code itself; integration test when the interesting behavior is in the interaction with something real.

Anti-Patterns to Recognize

Mock sandwich. Every collaborator mocked, every mock returns a canned value, and the test asserts mocks were called. The test re-implements the code backward and breaks on every refactor.
Assertion-free test. The test calls the function and does not assert the result -- coverage goes up, safety does not.
Coupled-to-implementation. The test asserts internal state (self._cache) rather than observable behavior.
Test of the language. assert sum([1, 2]) == 3 is testing Python, not your code.

Check Yourself

Name a kind of code in your capstone where unit tests add real leverage, and one where a unit test would mostly assert that the framework works.
Why is over-mocking a symptom of the wrong test level, not a style issue?
Which of the anti-patterns is most likely to appear in a service-layer test?
What distinguishes a sociable unit test from an integration test in Fowler's framing?
How would you convert a "mock sandwich" test into either a real unit test or a real integration test?

Mini Drill or Application (Capstone-scoped)

Open your capstone repo and list the ten files with the most branching logic. For each, decide: unit-testable, better-tested at integration, or not worth testing -- and produce a one-sentence justification.
Pick one file that is pure and branchy. Write five unit tests, each named as a behavior sentence (Dan North BDD style).
Run one small mutation: flip a < to <= in the logic and confirm your unit tests catch it. If they do not, strengthen the assertions.
Find one existing "mock sandwich" in your repo and convert it either up into an integration test or down into a real sociable unit test; note the line count difference in your capstone journal.
Benchmark the unit-test suite. If any test takes more than 20 ms, either speed it up or relabel it as an integration test and move it to the integration directory.

Source Backbone

Capstone implementation applies earlier code-quality, testing, and refactoring material. These books are the source backbone for that practice.

Software Engineering at Google - testing, review, and engineering-process backbone.
Refactoring - safe change and behavior-preserving improvement.
Good Code, Bad Code - maintainability and code-quality judgment.
Clean Code - readability and function-level craft support.

What This Concept Is​

Why It Matters Here (In the Capstone)​

Concrete Example(s) -- from a real capstone​

Common Confusion / Misconceptions​

How To Use It (In Your Capstone)​

A Decision Table for the Capstone​

Anti-Patterns to Recognize​

See also (integrative)​

Check Yourself​

Mini Drill or Application (Capstone-scoped)​

Source Backbone​