Skip to main content

Measuring Fitness Functions for Characteristics

What This Concept Is

A fitness function is any automated check that asserts whether the system currently satisfies an architectural characteristic. The phrase comes from Building Evolutionary Architectures (Ford, Kua, Parsons) and is used extensively in Fundamentals of Software Architecture.

A fitness function has four requirements:

  • it is tied to a specific characteristic
  • it is automated (a human reviewing a dashboard is not a fitness function)
  • it produces a pass/fail signal (or a numeric value compared against a threshold)
  • it runs regularly enough that a regression is caught before it ships

Fitness functions are how a characteristic becomes a testable promise instead of a wall poster. Without them, "the system is modifiable" is an opinion.

Why It Matters Here

Architectural characteristics erode silently. Nobody wakes up and says "let me break the modifiability of this codebase." It happens one PR at a time, over months, and by the time anyone notices, the fix is a rewrite.

Fitness functions turn the characteristic into a mechanism the system defends itself against. They are the architectural equivalent of unit tests: they embed the decision in the pipeline so the team does not have to remember.

Concepts 7-8 produced the top-3 characteristics. This concept produces the evidence each characteristic is still holding.

Concrete Example

Three fitness functions for three different characteristics.

1. Performance (streaming service).

# runs every 5 minutes in CI and in prod synthetic tests
from statistics import quantiles
import requests, time

latencies = []
for _ in range(200):
t = time.perf_counter()
r = requests.get("https://api.example/videos/v_42/manifest", timeout=2)
assert r.status_code == 200
latencies.append((time.perf_counter() - t) * 1000)

p95 = quantiles(latencies, n=20)[18] # ~p95
assert p95 < 120, f"manifest p95 regressed: {p95:.1f} ms > 120 ms"

Signal: pass/fail on every run. Wired into CI so a PR introducing 50 ms of regression fails before merge.

2. Modifiability (payments platform).

# architecture test: teller module may not import ledger internals
# one tool: ArchUnit (Java), Dependency-Cruiser (JS), import-linter (Python)
# using import-linter contract file:
#
# [importlinter:contract:teller-depends-only-on-ledger-api]
# type = forbidden
# source_modules = payments.teller
# forbidden_modules =
# payments.ledger.domain
# payments.ledger.storage
#
# run as: lint-imports

Signal: fails CI if a teller file imports a ledger internal. Protects a modifiability boundary that paper cannot enforce.

3. Security / Confidentiality (IoT platform).

# log-scanner fitness function - runs hourly in production
# fails if any log line contains a device secret pattern

gcloud logging read 'resource.type="k8s_container"' --freshness=1h --format="value(textPayload)" \
| grep -E 'secret_[A-Z0-9]{32}' && exit 1

exit 0

Signal: if a secret leaks into a log, alert within the hour. Pairs with a deploy gate that aborts if the check was not run today.

Fitness functions do not have to be slow, heavy, or pretty. They have to run.

Common Confusion / Misconception

"Fitness functions are just tests." They are a specific kind of test: one that defends an architectural characteristic, not a functional requirement. A unit test asserts "the add function returns 3 + 4 = 7." A fitness function asserts "no module crosses this layer boundary."

"If we have SLOs, we have fitness functions." SLOs measure a characteristic in production. That is one of several fitness-function shapes. Others include static analysis, dependency checks, chaos experiments, and compliance scans.

"We need a heavy framework." A 20-line Python script that runs in CI is a fitness function. Fancy tools help at scale; they are not the entry point.

"Every characteristic needs a fitness function." Aim for at least one per top-3 characteristic. More is better but with diminishing returns. The failure mode is zero, not three.

How To Use It

For each of your top-3 characteristics:

  1. Write the characteristic as a measurable scenario (Cluster 2).
  2. Name the signal that would tell you it is broken.
  3. Pick the cheapest automation that produces that signal.
  4. Wire the automation into the pipeline or production monitoring.
  5. Decide what happens on failure (block merge, page on-call, alert the channel).

A template for documentation:

Characteristic: modifiability (device adapters)
Scenario: adding a new device type touches only its adapter module
Signal: import or reference from outside the adapter
Check: import-linter CI contract forbidding cross-module references
Failure action: CI fails; PR cannot merge
Owner: platform team
Last green: <timestamp>

Fitness functions decay. Review them quarterly: are they still running? Do they still fail when they should? A fitness function that has been green for a year might just be dead.

Check Yourself

  1. Why is a dashboard reviewed by a human not a fitness function?
  2. Give an example of a fitness function that runs in CI and one that runs in production. Why do you need both?
  3. What is the signal that a fitness function has gone stale?
  4. For "testability," what is one lightweight fitness function you could add in an hour?

Mini Drill or Application

Pick your top-3 characteristics from Concept 8's drill. For each, write:

  • a scenario (one sentence)
  • a signal (what failure would look like)
  • a runnable check (actual code or command, not a description)
  • how it is wired (CI, cron, alert)

Commit at least one of them to a real pipeline. A fitness function that lives in a document is not a fitness function; it is homework.

Read This Only If Stuck