Measuring Fitness Functions for Characteristics
What This Concept Is
A fitness function is any automated check that asserts whether the system currently satisfies an architectural characteristic. The phrase comes from Building Evolutionary Architectures (Ford, Kua, Parsons) and is used extensively in Fundamentals of Software Architecture.
A fitness function has four requirements:
- it is tied to a specific characteristic
- it is automated (a human reviewing a dashboard is not a fitness function)
- it produces a pass/fail signal (or a numeric value compared against a threshold)
- it runs regularly enough that a regression is caught before it ships
Fitness functions are how a characteristic becomes a testable promise instead of a wall poster. Without them, "the system is modifiable" is an opinion.
Why It Matters Here
Architectural characteristics erode silently. Nobody wakes up and says "let me break the modifiability of this codebase." It happens one PR at a time, over months, and by the time anyone notices, the fix is a rewrite.
Fitness functions turn the characteristic into a mechanism the system defends itself against. They are the architectural equivalent of unit tests: they embed the decision in the pipeline so the team does not have to remember.
Concepts 7-8 produced the top-3 characteristics. This concept produces the evidence each characteristic is still holding.
Concrete Example
Three fitness functions for three different characteristics.
1. Performance (streaming service).
# runs every 5 minutes in CI and in prod synthetic tests
from statistics import quantiles
import requests, time
latencies = []
for _ in range(200):
t = time.perf_counter()
r = requests.get("https://api.example/videos/v_42/manifest", timeout=2)
assert r.status_code == 200
latencies.append((time.perf_counter() - t) * 1000)
p95 = quantiles(latencies, n=20)[18] # ~p95
assert p95 < 120, f"manifest p95 regressed: {p95:.1f} ms > 120 ms"
Signal: pass/fail on every run. Wired into CI so a PR introducing 50 ms of regression fails before merge.
2. Modifiability (payments platform).
# architecture test: teller module may not import ledger internals
# one tool: ArchUnit (Java), Dependency-Cruiser (JS), import-linter (Python)
# using import-linter contract file:
#
# [importlinter:contract:teller-depends-only-on-ledger-api]
# type = forbidden
# source_modules = payments.teller
# forbidden_modules =
# payments.ledger.domain
# payments.ledger.storage
#
# run as: lint-imports
Signal: fails CI if a teller file imports a ledger internal. Protects a modifiability boundary that paper cannot enforce.
3. Security / Confidentiality (IoT platform).
# log-scanner fitness function - runs hourly in production
# fails if any log line contains a device secret pattern
gcloud logging read 'resource.type="k8s_container"' --freshness=1h --format="value(textPayload)" \
| grep -E 'secret_[A-Z0-9]{32}' && exit 1
exit 0
Signal: if a secret leaks into a log, alert within the hour. Pairs with a deploy gate that aborts if the check was not run today.
Fitness functions do not have to be slow, heavy, or pretty. They have to run.
Common Confusion / Misconception
"Fitness functions are just tests." They are a specific kind of test: one that defends an architectural characteristic, not a functional requirement. A unit test asserts "the add function returns 3 + 4 = 7." A fitness function asserts "no module crosses this layer boundary."
"If we have SLOs, we have fitness functions." SLOs measure a characteristic in production. That is one of several fitness-function shapes. Others include static analysis, dependency checks, chaos experiments, and compliance scans.
"We need a heavy framework." A 20-line Python script that runs in CI is a fitness function. Fancy tools help at scale; they are not the entry point.
"Every characteristic needs a fitness function." Aim for at least one per top-3 characteristic. More is better but with diminishing returns. The failure mode is zero, not three.
How To Use It
For each of your top-3 characteristics:
- Write the characteristic as a measurable scenario (Cluster 2).
- Name the signal that would tell you it is broken.
- Pick the cheapest automation that produces that signal.
- Wire the automation into the pipeline or production monitoring.
- Decide what happens on failure (block merge, page on-call, alert the channel).
A template for documentation:
Characteristic: modifiability (device adapters)
Scenario: adding a new device type touches only its adapter module
Signal: import or reference from outside the adapter
Check: import-linter CI contract forbidding cross-module references
Failure action: CI fails; PR cannot merge
Owner: platform team
Last green: <timestamp>
Fitness functions decay. Review them quarterly: are they still running? Do they still fail when they should? A fitness function that has been green for a year might just be dead.
Check Yourself
- Why is a dashboard reviewed by a human not a fitness function?
- Give an example of a fitness function that runs in CI and one that runs in production. Why do you need both?
- What is the signal that a fitness function has gone stale?
- For "testability," what is one lightweight fitness function you could add in an hour?
Mini Drill or Application
Pick your top-3 characteristics from Concept 8's drill. For each, write:
- a scenario (one sentence)
- a signal (what failure would look like)
- a runnable check (actual code or command, not a description)
- how it is wired (CI, cron, alert)
Commit at least one of them to a real pipeline. A fitness function that lives in a document is not a fitness function; it is homework.
Read This Only If Stuck
- Fundamentals: Measuring architecture characteristics
- Fundamentals: Fitness functions
- Clean Architecture: FitNesse case (contextual, not core)
- Building Evolutionary Architectures (Ford, Kua, Parsons) - canonical treatment of fitness functions
- ArchUnit / import-linter / dependency-cruiser - architecture-testing tools across stacks