Risk Register: The Scariest Unknowns First
What This Concept Is
A risk register is a short, living table that names the things most likely to blow up the capstone, in priority order, with mitigation and an architecture implication beside each one. For a capstone, it has exactly four columns:
| # | Risk (what could go wrong) | Mitigation (what I will do about it) | Architecture implication (how it shapes the design) |
Risks are not bugs and they are not TODOs. They are unknowns -- the things you cannot answer today that could make the capstone unshippable if you answer them wrong in week 4. A bug is "the button doesn't work." A risk is "I don't know whether WebSockets reconnect reliably on a mobile network, and if they don't, the whole real-time story dies."
The register is the one-page spine connecting the problem (concept 01), the MVP (concept 03), the characteristics (concept 07), the ADRs (concept 12), and the schedule (concept 13). When any of those artifacts stop referencing a register row, the architecture has quietly drifted from the reasons it was built.
Keep it in library/raw/risks.md. Keep it under 5 rows. Revisit weekly.
Why It Matters Here (In the Capstone)
Risk-driven architecture (Fairbanks; Rozanski & Woods) says: do only as much architecture as the risk warrants, but for each risk you keep, drive the design from it. In a 6-week capstone, there is no time for speculative architecture. Everything you add to the architecture should be tied to a line in the risk register.
The register has a second, equally important job: it is the thing you look at in week 3 when you are tempted to polish a non-risk. If the risk is "I do not know if WebSockets reconnect reliably on a mobile network," polishing the dashboard CSS does not move the register. Work that does not reduce a listed risk is work you cannot afford.
Third: the register is the raw material of the defense (concept 15). Half the probing questions in a 5-minute defense are variations of "what could go wrong?" A capstone with a current register can answer every such question in one sentence. A capstone without one has to improvise under pressure.
Concrete Example(s) -- from a real capstone
Register A -- inventory service (concept 01 good problem A):
| # | Risk | Mitigation | Architecture implication |
|---|---|---|---|
| 1 | Offline scans must reconcile correctly when Wi-Fi returns; naïve last-write-wins will lose counts during hand-off between staff. | Spike a conflict-resolution strategy in week 1 using a per-scan idempotency key and a server-side merge function. Test with a 2-device offline drift scenario. | Forces an append-only audit log + derived on-hand view; not a single-row-per-SKU mutable store. Affects ADR-002 (datastore) and the Component diagram. |
| 2 | I have never shipped a PWA offline cache. | Build a 1-scan-offline-1-scan-online kata before MVP. | May fall back to "offline buffer in memory, warn on app kill" if PWA cache is too costly. Shapes the core subdomain's client boundary. |
| 3 | Daily reconciliation query will be slow on a week's worth of scans if I naïvely compute from the full log. | Add a materialized monthly rollup early; don't wait for performance pain. | Introduces a read-model table; pushes toward light CQRS. |
| 4 | I have to demo on someone else's Wi-Fi on the defense day. | Rehearse a demo that does not rely on the internet for at least one path. | None architectural -- delivery risk only. |
Register B -- ticketing platform (good problem B):
| # | Risk | Mitigation | Architecture implication |
|---|---|---|---|
| 1 | Two organizers issuing the last ticket simultaneously could oversell the event. | Prototype an atomic decrement-or-fail operation in week 1. Contract: never oversell, even at the cost of a user-visible "seats just went" error. | Forces a single source of truth with a transactional decrement; rules out eventual-consistency stores for seat caps. Drives ADR-002. |
| 2 | Door-scan runs in a basement venue with no signal. | Build a scan cache that verifies offline against a pre-downloaded ticket list and queues scans for later reconciliation. | Door-scan client holds a bounded snapshot of tickets; architecture splits "authority" (server) from "verifier" (phone). |
| 3 | I have never sent email at volume; deliverability could fail in week 5. | Send 10 test emails to varied providers in week 1 from the chosen provider. | May pick a managed email service (generic subdomain, buy). |
Register C -- finance aggregator (good problem C):
| # | Risk | Mitigation | Architecture implication |
|---|---|---|---|
| 1 | Duplicate imports could silently double-count transactions; I will not notice until reconciliation fails. | Enforce idempotency on every imported row by source-id hash; add a drill that imports the same file twice and asserts zero drift. | Forces an idempotent import pipeline and a per-source staging table before merge. |
| 2 | Categorization rules will grow complex and become untestable. | Keep rules as data (YAML), not code; every rule gets one table-driven test. | Introduces a rules-engine boundary in the core subdomain's Component diagram. |
| 3 | Personal financial data on a disk I control: if my laptop is stolen, blast radius is a year of statements. | Encrypt at rest with a passphrase-derived key; no cloud backup by default. | Sets a non-goal (no cloud sync) and adds one line to the threat model. |
Rows 1-3 of each register are architecturally significant: the design changes visibly in response. Row 4 is a delivery risk tracked only for schedule reasons. That is the ratio to aim for.
Common Confusion / Misconceptions
- Everything-Goes-Wrong List. "The register is a list of everything that could go wrong." No. It is a list of things that could go wrong and that would matter. A 40-row register is noise. Five rows is working. Three architecturally-significant rows is what you drive design from.
- Frozen Document. Treating the register as a static document written once. A risk that stops being scary should be closed; a risk that emerges in week 3 should be added. The register changes weekly or it is dead.
- Risks-as-TODOs. Every row becomes "I need to build X." That is a backlog. Risks are unknowns; if you already know what to build, the unknown is resolved and the row closes.
- Too-Abstract Framing. "Reliability might be an issue." What does that mean you would do differently on Tuesday? A useful risk is concrete enough to design a week-1 experiment against.
Categories of Risk Worth Scanning
A solo register tends to be strong on one category and blind to others. When you do your week-1 brain dump, deliberately scan each category:
- Capability risk: a technique or tool you have never shipped (e.g., WebSockets, OAuth, offline PWA caches).
- Integration risk: a seam between two components where the contract is fuzzy (auth + storage, frontend + realtime backend, import pipeline + rules engine).
- Scale risk: a path that works for 1 user but you haven't tested for 10 or 100 (or a scan rate of 20/min).
- Operational risk: something that will hurt you in production even though it's fine locally (secret management, cold-start latency, mid-demo restart).
- Correctness risk: invariants that must hold even under failure (no oversell, no double-import, no lost scan).
- Schedule risk: events on your calendar (exams, travel, illness) that will remove working days.
Most capstones under-represent operational and correctness risk and over-represent capability risk. Correct for that when sorting.
How To Use It (In Your Capstone)
- Brain dump day 1. List every unknown that could make the capstone unshippable. Do not filter.
- Collapse and merge. Combine duplicates; rewrite fuzzy ones into concrete sentences.
- Sort by expected damage if not mitigated. Top row = scariest.
- Keep the top 5; discard the rest. A longer register is not a stronger one.
- Write a mitigation you will actually do in week 1 or 2. Not "research", not "think about" -- a week-1 experiment, spike, or kata.
- Label each row architecturally significant or delivery only. Every significant row must appear in an ADR or as a driver in the design doc.
- Check weekly. At the end of each week, close resolved rows, promote new ones, re-rank. If the register has not changed for two weeks, you have stopped looking.
See also (integrative)
The register is the capstone's compressed version of prior-semester risk work. Pull from these directly when a row feels fuzzy.
S7 M01 -> Architectural risk and first principles-- use when you are tempted to skip the register because "I'm solo." Solo work raises risk; it does not remove it.S7 M05 -> Risk-driven review (RCDA prioritization)-- use when ranking the brain-dump into a top-5. The RCDA prioritization rubric applies directly.S7 M05 -> Reversibility: one-way vs two-way doors-- use when deciding whether a risk is architecturally significant. One-way-door risks always are; two-way-door risks rarely are.S8 M01 -> Reason about failure: what happens when X dies-- use when generating the failure-mode row of the register. Walk the happy path and kill one component at a time.S6 M05 -> The eight fallacies of distributed computing-- use when your capstone has any network seam at all. Every fallacy is a latent risk row.
External references (curation-validated this session):
- How Software Architecture Frames Requirements -- Woods & Rozanski (PDF) -- use when you want the canonical argument for driving architecture from risks iteratively rather than sequentially.
- Just Enough Architecture: The Risk-Driven Model -- Fairbanks (ResearchGate) -- use when justifying to yourself why 5 rows is enough and 30 is harmful.
- A Model for the Prioritization of Software Architecture Effort -- Eoin Woods (PDF) -- use when you have 15 candidate rows and cannot decide which 5 to keep.
- Embracing Risk -- Google SRE book, Ch. 3 -- use when a row feels like "we must not fail here ever." Read, then rewrite the row with an explicit acceptable-risk budget.
- A Risk-Driven Model for Agile Software Architecture -- Methods & Tools -- use when you are unsure how much architecture is "enough" for your capstone. The answer is always "just enough to close the top risks."
Check Yourself
- What is the scariest unknown on your capstone right now? Could you lose the capstone to it?
- For each of your top 3 risks, can you name the architecture implication in one sentence?
- What is the soonest experiment that could close your #1 risk, and when will it run?
- Which row on your register is a correctness risk (as opposed to capability or schedule)? If none, re-scan.
- If the register has not changed in two weeks, which is more likely: you solved everything, or you stopped looking?
Mini Drill or Application (Capstone-scoped)
- Drill 1 (20 min). Brain-dump every unknown on your capstone -- 10 to 20 rows. Do not sort yet. Then spend 5 minutes collapsing and sorting. Keep the top 5.
- Drill 2 (10 min). For each top-5 row, write a one-sentence mitigation that names a specific week-1 experiment. If the mitigation is "research", rewrite.
- Drill 3 (10 min). Label each row architecturally significant or delivery only. For every significant row, add a one-line architecture implication.
- Drill 4 (5 min, weekly). At Friday end-of-week, open the register. Close resolved rows, promote new ones, re-rank. Write one sentence per open row: "this week I did X against this risk."
- Drill 5 (once, mid-capstone). Force-kill the top row: pretend the risk materialized on Monday. Could the capstone survive? If no, promote its mitigation to this week's work.
Source Backbone
Capstone design applies earlier architecture and domain material. These books are the source backbone for the decisions in this module.
- Fundamentals of Software Architecture - architecture characteristics, styles, and tradeoffs.
- Learning Domain-Driven Design - domain discovery, subdomains, and bounded contexts.
- Clean Architecture - dependency direction and boundary discipline.
- API Design Patterns - contract and API decision support.