Steel Thread: Proving Integration Before Polish
What This Concept Is
A steel thread is one thin but complete path of real behavior through your system, chosen because it forces the interesting integration points to work. Where a walking skeleton proves the seams exist, a steel thread proves that a real user story can flow through them.
Think of it as the first non-trivial vertical slice:
- one user action that matters (e.g., "user creates a task and sees it listed");
- all layers exercised with real code, not stubs;
- every integration point traversed at least once (auth, DB, external API, queue, cache, whichever apply);
- tests at each level prove the thread holds under pressure.
It is the sharp strand that the rest of the feature set gets wound around. The metaphor predates software engineering -- in telephone-switch engineering, the "steel thread" was the first path through the switch that actually carried a call. In agile practice it is sometimes called a tracer bullet or a first end-to-end user journey.
The steel thread's selection is not a product decision. It is an engineering decision: whichever path goes through the riskiest integrations first. Product priority decides what ships in week N; steel-thread score decides what ships in week 1.
Why It Matters Here (In the Capstone)
In capstone delivery the most expensive failures are integration failures: the code the student wrote works, but the external system's schema changed; the DB migration ran, but the ORM ignored a column; auth worked in dev, but the token format differs in staging. These problems do not show up in unit tests. They show up when real components meet.
A steel thread exposes these incompatibilities inside week 1 or 2. By the time features start stacking up, the hard integration decisions are already made and stable. Without a steel thread, integration risk bleeds into every feature in the last week, where there is no time to fix it -- and this is exactly the failure mode S10 rubrics penalise most.
Concrete Example(s) -- from a real capstone
Imagine the capstone is a small task manager that pulls tasks from GitHub Issues via the GitHub REST API.
Walking skeleton (earlier concept): GET /health that pings the DB. No GitHub, no user behavior, no auth.
Steel thread: POST /projects/{id}/sync that actually:
- reads a project from the DB;
- authenticates to GitHub via a token stored in env;
- calls
GET /repos/{owner}/{repo}/issuesfor real; - parses two fields (
title,state); - upserts the result into the
taskstable; - returns
{ "synced": 3 }.
Polish intentionally absent: no rate-limit handling, no error translation, no pagination, no retries, no background jobs. But every integration point has now been crossed once:
- HTTP ingress works;
- DB read works;
- external API auth works;
- third-party JSON parsing works;
- DB write works;
- end-to-end response works.
Any of those could have silently broken; now they cannot. In week 3, when pagination, retries, and rate limits are added, they are being added to something that already works, not being discovered as missing foundations.
Common Confusion / Misconceptions
The steel thread is often confused with "the first feature." It is not. The first feature is a product decision. The steel thread is an engineering decision: whichever path goes through the riskiest integrations first. If the feature backlog has three stories and only one of them touches the external API, the steel thread is that one, regardless of product priority.
Another confusion: treating the steel thread as temporary scaffolding. It is production code. It will be extended, not replaced. Corners cut inside the steel thread must be tracked in the technical-debt ledger (see Concept 15).
A third is the mocked thread. If the external call is mocked, the integration risk is not surfaced. The thread must hit real auth and a real (or a recorded-cassette) external endpoint, even if the test harness uses VCR-style replay. Mocked threads pass CI and break production.
A fourth is the invisible thread -- the thread was merged, but no one ran it in staging. Until the thread has been deployed and exercised in a production-equivalent environment, the integration has not actually been proven.
How To Use It (In Your Capstone)
After the walking skeleton is in main, pick the steel thread:
- List the capstone's integration points (DB, cache, queue, external APIs, auth, feature flag service, file storage, etc.).
- Score each backlog story by how many of those points it touches; break ties by the riskiness of the touched points (an external API > an in-process cache).
- Pick the story with the highest integration score, not the highest product priority.
- Implement the thinnest version of that story that actually uses each integration point with real code.
- Time-box the thread at one week. If it cannot be done in a week with corners explicitly cut, the story is too wide or the seams are wrong.
- Write one integration test and one end-to-end test that prove it works; deploy to staging and exercise manually with a real identity.
- Ledger every corner cut (rate limits, retries, pagination, nice errors) in
DEBT.mdso they are visible in week 3.
Anti-Patterns to Recognize
- Polish on the thread. Adding retries and error translation before the thread is green. The polish is for after the thread is proven.
- Feature-complete thread. Narrowing to two fields and one happy path is correct; full pagination and filters are week 4 work.
- Mocked thread. Mocks hide the one risk the thread was designed to surface.
- Invisible thread. Merged but never exercised in staging; the integration has not actually happened.
- Polymorphic thread. The thread tries to be generic over "any external provider" before any one provider works. Ship one provider, then generalize.
See also (integrative)
- S7 M05 ADRs & Reviews -- the risk-driven review lens; steel threads attack the top risks first
- S8 M01 System Design Methodology -- identifying the integrations that matter in a whole-system sketch
- S6 M05 Distributed Systems Fundamentals -- the cross-process failure modes the steel thread is actually probing
- S3 M05 Applied Design & Code Review -- reviewing a steel-thread PR against its declared risk list
- S10 M01 Domain Analysis & Architecture Design -- the architecture the thread is traversing
External references:
- Martin Fowler: The Practical Test Pyramid -- how to test across the thread at each level
- Henrik Jernevad: Break down silos with a walking skeleton -- the "tracer bullets" framing of Hunt & Thomas applied to integration
- DevOps Stack Exchange: What is a walking skeleton? -- community distinction between skeleton, tracer, and steel thread
- Martin Fowler: Keystone Interface -- how the thread becomes the spine for later features
Check Yourself
- What is the difference between a walking skeleton and a steel thread, and which comes first in a capstone timeline?
- Why pick the story that touches the most integrations instead of the most important story?
- What goes in the technical-debt ledger after the steel thread is shipped, and when should it be closed?
- What disqualifies a "mocked" thread from being a real steel thread?
- If the steel thread takes more than a week, what two explanations must you investigate before widening the time budget?
Mini Drill or Application (Capstone-scoped)
- In 20 minutes, identify the steel thread for your capstone by listing all integration points and mapping each backlog story to the ones it touches.
- Pick the story with the highest integration score, and write one paragraph explaining why that path surfaces the most risk. Attach this to
library/raw/steel-thread.mdand defend the choice in your weekly capstone journal. - Ship the steel thread to staging this week. Include a real auth exchange with the external provider (no recorded credentials).
- Write one integration test that hits the real-ish external (fake or cassette) and one E2E test in staging.
- Open two ledger entries for cuts the thread deferred (pagination, retries, error translation, etc.) with concrete triggers for when each must be fixed.
Source Backbone
Capstone implementation applies earlier code-quality, testing, and refactoring material. These books are the source backbone for that practice.
- Software Engineering at Google - testing, review, and engineering-process backbone.
- Refactoring - safe change and behavior-preserving improvement.
- Good Code, Bad Code - maintainability and code-quality judgment.
- Clean Code - readability and function-level craft support.