Bug Triage Discipline: Sev, Impact, Root-Cause Tag
What This Concept Is
Bug triage is the first pass on every defect report that converts "there is a problem" into a decision about what to do. For a capstone it must produce three fields in under ten minutes:
- Severity (Sev) -- how urgently the defect must be handled (Sev-1 critical, Sev-2 important, Sev-3 normal, Sev-4 cosmetic);
- Impact -- who and what is affected (feature, number of users, data correctness, security, observability);
- Root-cause tag -- a category for the kind of fault it appears to be (e.g.,
validation,schema-drift,concurrency,external-api,ui-regression,env-config).
The output is not a fix. It is a written classification that makes the next decision fast and consistent. Google's SRE postmortem culture is the industrial reference: every incident gets a blameless, classified record before any remediation, because the record is how organisations notice repeating patterns that individual fixes would hide.
Why It Matters Here (In the Capstone)
In a solo or small-team capstone, triage is often skipped. The cost is invisible until a bad week arrives: three defects sit open, each half-fixed, with no record of which is urgent. Without triage you cannot:
- decide what to fix first;
- notice patterns ("three bugs this week are all
schema-drift"); - write a defensible post-mortem for M04;
- negotiate scope when the deadline tightens.
A disciplined triage log is one of the most leveraged artifacts in a capstone because it is also the raw input for your technical-debt ledger (Concept 15) and your self-review (Concept 14). Dan Luu's writing on the value of cheap classification in debugging is apt: classification does not fix the bug, but it concentrates your attention where the cost of being wrong is highest.
Concrete Example(s) -- from a real capstone
A real triage entry for the task-manager capstone:
id: BUG-2026-04-04-01
title: POST /tasks rejects valid unicode titles
reporter: author (from manual testing)
sev: Sev-2
impact: users cannot create tasks with emoji; estimated 10% of real titles;
no data loss; core feature only partially usable
root-cause: validation (pydantic regex rejects non-ASCII)
reproduction: curl -X POST /tasks -d '{"title":"ship rocket"}' -> 422
fix-path: update validator to use unicode-aware regex + add regression test
owner: me
status: open
Five such entries in a week let you see:
- 3 of 5 are
validation: fix your input-validation story as a class; - 1 is
schema-drift: add a contract test at that boundary (Concept 9); - 1 is
env-config: document env variables and add a startup check.
That pattern is invisible without the tags.
Common Confusion / Misconceptions
The most common confusion is treating severity and impact as the same thing. They are not. A Sev-1 is about how fast you must respond; impact describes what is affected. A single user hitting a bug can still be Sev-1 if the effect is data corruption.
A second confusion is choosing the root-cause tag from inside the fix. Triage names a suspected category based on the report. If it turns out to be something else during the fix, update the tag -- the change itself is useful data.
A third is over-engineering triage for a capstone. You do not need a ticketing system. A plain bugs.md or an issues board with a fixed set of labels is enough. What matters is that every bug has a row.
A fourth is severity inflation -- every bug filed as Sev-1 because "it feels urgent." Inflation kills the signal: when everything is Sev-1, nothing is. Define Sev-1 narrowly and defend the definition in weekly audits.
How To Use It (In Your Capstone)
Every time a defect is reported (by you or anyone):
- Create a triage entry within ten minutes. Do not start fixing.
- Fill in Sev, impact, and a suspected root-cause tag from a short, fixed list.
- Add reproduction steps, even if rough.
- Decide disposition: fix now, schedule for this week, schedule for later, decline.
- Link to the commit and regression test once the fix lands.
- Only then start implementation -- and start with a failing regression test (Concept 11).
- Every Friday, sweep the log for patterns and roll Sev-3/Sev-4 entries into the debt ledger (Concept 15) with triggers.
A simple Sev rubric for capstone use:
- Sev-1: data loss, data corruption, security exposure, or the system is unusable. Stop what you are doing.
- Sev-2: a primary feature is broken for a meaningful fraction of users or scenarios. Fix this sprint.
- Sev-3: a non-primary feature is degraded, or a primary feature is degraded in edge cases. Schedule it.
- Sev-4: cosmetic or minor ergonomic issues. Ledger it and consider deferring.
A Fixed Root-Cause Tag Set for the Capstone
Use a closed list. Free-form tags produce a long tail of unique labels that cannot be aggregated.
validation-- input parsing, bounds, types rejected at the boundaryschema-drift-- shape mismatch between code and database or external APIconcurrency-- race, ordering, reentrancy, shared-state issueexternal-api-- third-party provider error or behavior changeenv-config-- env vars, secrets, feature flags, deploy target differencesui-regression-- front-end behavior regressed against prior releaseobservability-gap-- bug was hard to diagnose because logging or tracing was absentperformance-- slow path, timeout, resource exhaustionauthorization-- cross-tenant, role, or permission errordata-integrity-- bad write, missing transactional boundary, orphaned rows
Ten tags is enough. Review the set quarterly; add or remove based on what the ledger shows.
Anti-Patterns to Recognize
- Triage-by-commit. Opening a PR without a triage entry to link back to.
- Unfixable without context. Triage entry that does not name reproduction steps.
- Severity inflation. Every bug filed as Sev-1. Sev-1 becomes meaningless.
- Open-ended disposition. "We will look at this" is not a disposition.
Weekly Triage Review
Once a week:
- confirm every open entry still belongs (close stale or duplicate ones);
- count entries by root-cause tag; if one tag dominates, invest in that class of defense (contract tests, startup checks, etc.);
- check that every Sev-1 and Sev-2 either has a regression test in progress or a dated disposition;
- roll aged Sev-3/Sev-4 entries into the debt ledger (Concept 15) with a trigger.
See also (integrative)
- S3 M05 Applied Design & Code Review -- labels and comments as a review-time habit
- S5 M03 Concurrency & Synchronization -- the theory behind the
concurrencyroot-cause tag - S6 M04 Transactions & Consistency -- why
schema-driftanddata-integrityneed their own tags - S10 M04 Operational Readiness -- severity categories map directly to incident severities and post-mortems
- S8 M05 Technical Leadership & Strategy -- defending severity calls to a reviewer or stakeholder
External references:
- Google SRE Book: Postmortem Culture -- Learning from Failure -- blameless classification as industrial practice
- Google Testing Blog: Flaky Tests at Google and How We Mitigate Them -- triage as input to flake remediation
- Dan Luu: Why don't schools teach debugging? -- classification as the first step in the debug loop
- Google Eng-Practices: Standard of Code Review -- reviewers value classification, not just observations
Check Yourself
- What three fields must every triage entry have before any fix begins?
- Why is choosing a root-cause tag at triage time valuable even if it might change?
- Why is the difference between severity and impact important?
- What signal does "three
schema-driftentries in one week" give you, and what is the right next action? - How would you detect severity inflation in your own triage log?
Mini Drill or Application (Capstone-scoped)
- Pick three defects in your capstone (invent them if you have none yet) and write triage entries for each using the sample format.
- For each entry, pick a Sev using the rubric, describe impact in one sentence, and choose a root-cause tag from the fixed list.
- Decide disposition for each: fix now / schedule / decline. Write the rationale in one sentence.
- Commit
bugs.mdto the repo and link every subsequent fix commit back to the triage entry. - At the end of the week, count entries by tag and pick one "dominant tag" investment (contract test, validator rewrite, startup check) to make next sprint.
Source Backbone
Capstone implementation applies earlier code-quality, testing, and refactoring material. These books are the source backbone for that practice.
- Software Engineering at Google - testing, review, and engineering-process backbone.
- Refactoring - safe change and behavior-preserving improvement.
- Good Code, Bad Code - maintainability and code-quality judgment.
- Clean Code - readability and function-level craft support.