Evidence of Craft: Tests, CI, Docs, Commits
What This Concept Is
Beyond the README, a reviewer who clicks into your repo is looking for signals of craft -- small, compounding things that a careless project never accumulates. There are four load-bearing signals:
- Tests -- they exist, they pass, and the names say what they are testing.
- CI -- an automated check runs on every PR and tells the truth.
- Docs beyond the README -- at least one ADR, one runbook, or one design note.
- Commits -- individual commits are legible and the history tells a story.
No one of these proves you are a good engineer. Together, they make it hard to fake. Each signal is small on its own, but the set of them is a cheap, strong indicator that the codebase was maintained by someone who thought about it.
A subtler point: these signals are mostly produced during the work, not after. A project that accumulates them organically reads differently from one where they were backfilled the weekend before a job search. Reviewers can almost always tell; the commit dates, the PR cadence, the coverage trajectory all leak the truth. This means the time to care about craft signals is not Module 5 -- it was every module before it. Module 5 is where you surface them.
The four signals also interact. Good test names make CI failures readable. Legible commits make PRs reviewable. ADRs give future you a reason to write good commit messages because the commit will cite the ADR by number. Each signal raises the payoff of the others.
There is also a fifth, meta-signal that emerges only with enough of the first four: a healthy contribution cadence. A repo with no tests and fifty commits in December reads as exam cramming; a repo with tests, CI, and twenty-five commits across twelve months reads as sustained work. The cadence signal is the hardest to fake because it requires time -- which is exactly why senior reviewers weight it heavily.
Why It Matters Here (In the Capstone)
Reviewers do not interview the code; the code interviews itself. A repo with zero tests, a green README badge that has not updated in a year, and commits named fix tells a specific story. So does a repo with 40 tests, a CI run on every PR, three ADRs, and commits that read like a changelog.
The second, subtler effect: these signals bias how the reviewer reads everything else. The same piece of code looks thoughtful in one repo and suspicious in another, based entirely on the surrounding signals.
For the capstone this is load-bearing because the capstone is the only ten-semester project on your profile. If its craft signals are thin, the write-up is doing work it should not have to do; if they are strong, the write-up gets to focus on decisions rather than defending basic hygiene.
Concrete Example: What a Reviewer Actually Clicks
An experienced reviewer does not read your code top-to-bottom. They do roughly this:
- Open README (5-15 seconds).
- Check recent commits (5-10 seconds) -- are messages meaningful?
- Open the CI tab or the latest workflow run -- is it green, how long does it take, what does it actually check?
- Open the
tests/folder -- scan filenames. Are names behaviors (e.g.,test_dedup_prevents_double_ingest.py) or placeholders (test_1.py)? - Look for
adr/orlibrary/raw/-- does the project reason about itself? - Only then open the source.
Almost none of that is the source. The first five moves decide whether the source gets a fair read.
Concrete Example: Craft Signal Scorecard
For each pinned repo, this scorecard makes the audit explicit:
| Signal | Weak | Adequate | Strong |
|---|---|---|---|
| Tests | None or test_1.py | Happy-path coverage | Behavior-named, covers failure modes |
| CI | Missing or red | Green on main | Green on every PR, runtime < 5 min, checks lint + tests + build |
| Docs | README only | README + one ADR | README + 3+ ADRs + runbook or design note |
| Commits | fix, wip, update | One-line descriptive | Changelog-legible; cites ADR numbers or issue IDs |
A repo that sits at "weak" on three rows should be unpinned; one that sits at "strong" on three is pinnable without embarrassment.
Common Confusion / Misconceptions
Treating these signals as ornaments. Adding a tests/ folder with three trivial tests right before a job search does not help; a reviewer notices the commit dates. The signal works because it accumulates. The time to start treating tests, CI, and ADRs as load-bearing was Semester 4; the second-best time is Semester 10, with the understanding that backfilled signals read as backfilled.
Chasing 100% coverage. Coverage percentage is a poor proxy for quality. A reviewer cares more about what you chose to test than how much. Three tests that pin the tricky behavior beat two hundred tests that only touch happy paths.
Sloppy commit messages as humility. fix bug is worse than no commit message, because it broadcasts carelessness. fix: prevent dedup key collision when feeds A and C share external_id is a paragraph compressed into a line and reads as craft.
Squashing everything into a single commit to "look clean." A clean history is a readable history, not a collapsed one. A pull request whose merged commit says only feat: inventory service hides the thinking. Prefer semantic commits per meaningful step, with a clean rebase before merge if needed.
How To Use It (In Your Capstone)
Audit your pinned repos with this checklist:
- Tests: exist, pass, named after behaviors, cover at least one failure mode per major subsystem.
- CI: runs on every PR, includes the test suite, runtime reasonable, badge in README (and the badge is green).
- Docs: at least one ADR per non-trivial design decision; one runbook if ops-relevant.
- Commits: messages are sentences; PR titles read like changelog entries;
mainhas no WIP commits. - Traceability: major design decisions in the code link back to ADRs by number or link.
- Dashboards or reports: if the project has observability, link one graph or one readable status page from the README.
- Quarterly re-audit: put a one-line "last craft audit YYYY-MM" in the README.
For each miss, either fix it or unpin the repo. Do not pin a repo whose craft evidence you are not willing to show.
Check Yourself
- On your most-pinned repo, how long is the latest CI run, and what does it actually check?
- Pick a random test filename in that repo -- does the name describe a behavior?
- Read the last ten commit messages on
main. Would they make sense in a changelog? - Does
maincurrently have any commits with messages shorter than five words? - Are ADRs referenced by number in commits, PRs, or code comments?
- If a reviewer cloned your capstone today, which one signal would they notice first -- and is that the signal you would have chosen?
Mini Drill or Application (Capstone-scoped)
Three drills, all in under an hour of capstone repo time:
- Rename three tests. Open your capstone tests. Rename three test functions from implementation names to behavior names (e.g.,
test_dedup->test_dedup_accepts_first_event_and_rejects_retry_within_ttl). Commit with a message describing the change. - Write one ADR. Pick a design decision that is not yet written down. Draft a two-section ADR (Decision / Consequence), file under
library/raw/adr/NNNN-slug.md, reference it in the relevant source file as a comment. - Rewrite two commits. On a feature branch, amend two recent commits so the messages pass the "changelog test" (would this line make sense in a released changelog?). If you cannot rewrite history on
main, add follow-up commits with proper messages. - CI honesty check. Open your most-pinned repo's CI config. Ask: does any step skip on main, silently cache around failures, or mark flakes as success? Fix one. The CI lying even once discredits every green badge afterwards.
Small, concrete, compounding. Add a library/raw/craft-audit-YYYY-MM.md summarizing the changes; that file itself is craft evidence.
Transfer / How This Synthesizes Prior Semesters
Craft signals are not a Module 5 add-on; they are the visible residue of how you did the prior nine semesters:
- S2 M01 algorithm analysis & design -- behavior-named tests require that you can state the invariant precisely; that is the same discipline used to state a loop invariant or a recurrence.
- S3 M05 applied design & code review -- the same review eye that rejects a dead helper rejects a dead test and a weak commit message. PR discipline is visible in commit history years later.
- S4 M01 C programming fundamentals -- the habit of reading failures carefully, naming assertions, and treating crashes as first-class comes from here; test names that describe failure modes are its downstream artifact.
- S7 M05 ADRs & reviews -- the ADR habit is the highest-leverage craft signal; the portfolio is where it becomes public.
- S9 M04 CI/CD pipelines & release engineering -- the pipeline you built there is the one a reviewer is about to look at; a green, fast, honest pipeline is a direct transfer.
- S10 M02 implementation & testing and S10 M04 operational readiness & security review -- the tests, runbooks, and threat models produced there are the craft evidence Module 5 is now asking you to surface.
The single transferable fact: craft signals leak their own history. A reviewer cannot always articulate why a repo "feels careful," but the commit dates, the test names, and the ADR cadence are the leaks doing the work.
A second transferable fact, less obvious: craft signals mean different things in different tracks. An SRE reviewer reads "runbook + SLO doc + on-call log" as evidence; a platform-engineering reviewer reads "internal SDK + deprecation note + user doc"; a distributed-systems reviewer reads "correctness test + jepsen-style harness + post-mortem on a partition." Which signals you cultivate should match the track concept 12 points you at -- otherwise a strong-looking repo reads as off-target to the reader you most care about.
See also (integrative)
- S7 M05 -- ADRs and reviews: the ADR habit is the single highest-leverage craft signal.
- S10 M02 -- Implementation & testing: the test pyramid and CI from capstone implementation is the evidence a reviewer reads.
- S10 M03 -- Cloud deployment & CI/CD: pipeline artifacts are part of the craft surface area.
- S10 M04 -- Operational readiness & security review: the runbook and threat model are the "docs beyond README" that distinguish a capstone from a toy.
- S3 M05 -- Applied design and code review: the same review discipline applied to a colleague's PR is the signal you encode in your repo history.
- External -- Irrational Exuberance (lethain.com): essays that accumulate into a body of work -- same principle as commits and ADRs.
- External -- Catalog of Patterns of Distributed Systems (martinfowler.com): a decade-scale craft signal -- repo, book, talks all interlink.
- External -- Evolutionary architecture (thoughtworks.com): fitness functions and automated architecture checks are CI-shaped arguments.
- External -- Leveraging Go Worker Pools (shopify.engineering): a production-engineering post showing the craft signals in public -- numbers, named failure modes, tested assumptions.
- External -- How I write engineering-strategy docs (lethain.com/how-i-write-engineering-strategy): Will Larson's visible process -- the writing and the revision history -- is itself the craft signal.
- External -- Testing Microservices at Monzo (monzo.com/blog/2018/02/12/testing-microservices): behavior-named integration tests in a production banking context; read for test naming and failure-mode coverage.
- External -- Operating at staff (staffeng.com/guides/operating-at-staff): on the artifacts senior engineers leave behind -- many of them live in the repo as ADRs, runbooks, and reviewed PRs.
Source Backbone
Portfolio assessment packages evidence from the whole curriculum. These books provide the technical and professional backbone for the narrative.
- Software Engineering at Google - engineering evidence, review, and team-scale standards.
- Fundamentals of Software Architecture - architecture vocabulary and tradeoff defense.
- Building Secure and Reliable Systems - security and operational evidence standards.