Risk-Driven Review and RCDA-Style Prioritization

What This Concept Is

Risk-driven architecture review (George Fairbanks, Just Enough Software Architecture) takes a simple stance: spend review effort in proportion to risk. You do not evaluate everything. You identify where the design is most likely to fail, and you concentrate there.

RCDA (Risk- and Cost-Driven Architecture) from the Open Group generalizes the same idea with explicit cost-of-review calculus: pick techniques whose expected reduction in risk is larger than the cost of applying them.

The operational procedure:

Enumerate risks: technical, integration, operational, organizational.
Score each by likelihood x impact.
For the top risks, pick a technique that targets them specifically (prototyping, modeling, spiking, running a scenario).
Stop when the remaining risks are acceptable given context.

Why It Matters Here

This is the antidote to both "we don't review anything" and "we ATAM every change." Most decisions fall between: they carry some risk, and blanket methods are wasteful. Risk-driven review gives you a way to pick just enough rigor.

It also gives a written justification for the level of review chosen. "We prototyped the two-phase commit because its failure would have been catastrophic; we skipped formal review on the internal dashboard because its failure is recoverable in one sprint." That justification is an artifact of governance on its own.

Concrete Example

Proposal: add an external identity provider (Auth0) alongside our existing in-house SSO.

Risks identified:

# Risk Likelihood Impact Technique
1 Session store incompatible between providers during rolling migration Medium High Prototype: run both side-by-side with a test user population for 2 weeks
2 Support costs spike during dual-provider window Medium Medium Spike: instrument support intake for Auth0 issues; set a budget threshold
3 Auth0 outages correlate with business hours in our region Low High Availability modeling: check historical Auth0 SLA and our own peak
4 Admin UI labels confuse users High Low Usability review (not an architecture concern)

Review chosen: deep prototype for Risk 1 (2 sprints); light spike for Risk 2 (1 week); desk check for Risk 3 (half day); Risk 4 handled outside architecture.

Not done: full ATAM (overkill); no review at all (Risk 1 is severe).

#	Risk	Likelihood	Impact	Technique
1	Session store incompatible between providers during rolling migration	Medium	High	Prototype: run both side-by-side with a test user population for 2 weeks
2	Support costs spike during dual-provider window	Medium	Medium	Spike: instrument support intake for Auth0 issues; set a budget threshold
3	Auth0 outages correlate with business hours in our region	Low	High	Availability modeling: check historical Auth0 SLA and our own peak
4	Admin UI labels confuse users	High	Low	Usability review (not an architecture concern)

Each technique is chosen because its cost is less than the expected value of risk reduction, not because "we always do X for this kind of change."

Common Confusion / Misconception

"Risk-driven means we only review high-risk stuff." It means you allocate review effort by risk. Low-risk changes still get reviewed - just quickly.

"All risks are technical." No. Operational risks (can the SRE team support this?), organizational risks (will the partner team agree?), and timeline risks (does the review itself cause a bad release window?) are all first class.

"If we cannot measure the risk, ignore it." Dangerous. Unquantified risk is not the same as low risk. Record the uncertainty explicitly.

How To Use It

Before scheduling a review:

Write a risk list for the change. 10 minutes, freeform.
Score likelihood and impact on a 3x3 grid (Low / Medium / High each).
Pick the top 1-3 risks. For each, pick one review technique with a cost estimate.
If the sum of chosen costs is comparable to the expected damage of the risks, the allocation is reasonable. Otherwise, adjust.
Document the risk list and the chosen techniques in the ADR (as a short "Review" section). Future readers get the reasoning.

Check Yourself

Give one example each of a technical, operational, and organizational risk from a recent project. How would your review technique differ for each?
Why is "no review" a defensible choice for some changes? Give an example.
What does the Risk list look like if you skipped this step and just "felt" the risk?

Mini Drill or Application

Take an upcoming change and build its risk table:

at least 4 risks across categories
scored on the 3x3 grid
one review technique per top risk with a ballpark time cost
a one-line justification for skipping review on the rest

Compare to what your team would actually do. Usually the team is either over-reviewing low-risk items or under-reviewing one specific high-impact risk.

What This Concept Is​

Why It Matters Here​

Concrete Example​

Common Confusion / Misconception​

How To Use It​

Check Yourself​

Mini Drill or Application​

Read This Only If Stuck​