Designing for Testability without Damaging the Domain

What This Concept Is

Testable code is code whose behavior you can exercise cheaply, in isolation, and at the level of granularity that matters. That is a desirable property. The failure mode is testability creep: contorting the production code specifically to make tests easier, until the code is worse as a domain model than it was before the tests.

Healthy testability uses three moves:

Seams -- small, deliberate substitution points (dependency injection of the clock, filesystem, HTTP client).
Pure cores / impure shells -- domain logic is a pure function over data; side effects sit in a thin shell that can be swapped in tests.
Behavior, not implementation -- tests verify the domain's observable contract, so refactors do not require mass test rewrites.

Unhealthy testability uses:

Test-only public methods (setInternalStateForTests)
Leaking mocks into production (real code branching on whether a mock is present)
Visibility-weakening for tests (making everything public; god-mode reflection in tests)
Over-mocking -- every dependency mocked so the test pins down the implementation rather than the behavior

Why It Matters Here

Designing for testability is the one place where "this is hard to test" is a legitimate design signal. Hard-to-test code usually has hidden coupling or poor boundaries. But the response must not be "bolt on a backdoor"; it must be "fix the boundary." Otherwise you ship two things in one class: the domain logic and a test scaffold, and the domain loses.

Concrete Example

Bad (testability damage):

class OrderService:
    def __init__(self):
        self._clock = time.time
        self._db = real_db

    def confirm(self, order_id):
        now = self._clock()
        ...

    # added only to support tests
    def set_clock_for_tests(self, fake_clock):
        self._clock = fake_clock

    # added only to support tests
    def _DEBUG_peek_internal_state(self):
        return self._db._raw_rows

This class carries testing scaffolding into production. Anyone reading OrderService has to guess which methods are "real."

Better (seam + pure core):

class Clock:
    def now(self) -> datetime: ...

class SystemClock(Clock):
    def now(self): return datetime.now(tz=UTC)

class OrderService:
    def __init__(self, clock: Clock, repo: OrderRepository):
        self._clock = clock
        self._repo = repo

    def confirm(self, order: Order) -> Order:
        # pure domain function, easy to test without a clock mock at all:
        return order.confirmed_at(self._clock.now())

class Order:
    def confirmed_at(self, now: datetime) -> "Order":
        # pure function over data
        return replace(self, status="confirmed", confirmed_at=now)

Tests for Order.confirmed_at need zero mocks -- they pass a datetime. Tests for OrderService.confirm use a trivial fake clock. No test-only public surface on production classes.

Common Confusion / Misconception

"Mock everything to isolate." Mocking every collaborator produces tests that pass when the code is broken and fail when the code is refactored. Mock at boundaries you do not own (network, filesystem, time). Prefer real objects for in-memory domain code.

"Make it public so the test can reach it." Almost always the wrong move. Test the behavior through the public API; if it is awkward, the API is probably missing a method that production code will also want.

"Untestable code must be wrong." Not always. Some code is simply a thin composition of others (adapter layers). A handful of integration tests at the top is better than forcing every adapter to be unit-tested through hoop-jumping.

"100% coverage is the goal." Coverage is a weak signal. A high-coverage test suite can still miss every important property of the domain; a leaner, behavior-focused suite can catch more bugs.

"Tests are separate from design." They are not. Hard-to-test code is a design smell. The response is to redesign the boundary, not to weaken production encapsulation.

How To Use It

When something is hard to test, diagnose before hacking:

Is there a hidden dependency (time, random, filesystem, network, env var)? Inject it.
Is the logic entangled with I/O? Split into a pure function plus a thin shell.
Is the unit too big? Extract a smaller class with a clearer responsibility.
Does the test want to peek at internal state? The behavior you want to assert is probably missing from the public API; add it for production, not for tests.

Two small rules:

No method exists solely for tests.
Mocks only for things you do not own.

Check Yourself

Why is a set_clock_for_tests method a design smell?
Give an example of an injectable seam that is good design independently of tests.
When is over-mocking worse than no mocking?

Mini Drill or Application

For each code smell, diagnose the testability issue and propose a refactor that does not damage production:

A class that reads the current time by calling System.currentTimeMillis() directly.
A function that both computes a discount and writes to the database.
A service where every method is public because "the tests needed them."
A test file that mocks twelve collaborators to call one method.

What This Concept Is​

Why It Matters Here​

Concrete Example​

Common Confusion / Misconception​

How To Use It​

Check Yourself​

Mini Drill or Application​

Read This Only If Stuck​