Skip to main content

Bug Triage Discipline: Sev, Impact, Root-Cause Tag

What This Concept Is

Bug triage is the first pass on every defect report that converts "there is a problem" into a decision about what to do. For a capstone it must produce three fields in under ten minutes:

  • Severity (Sev) -- how urgently the defect must be handled (Sev-1 critical, Sev-2 important, Sev-3 normal, Sev-4 cosmetic);
  • Impact -- who and what is affected (feature, number of users, data correctness, security, observability);
  • Root-cause tag -- a category for the kind of fault it appears to be (e.g., validation, schema-drift, concurrency, external-api, ui-regression, env-config).

The output is not a fix. It is a written classification that makes the next decision fast and consistent. Google's SRE postmortem culture is the industrial reference: every incident gets a blameless, classified record before any remediation, because the record is how organisations notice repeating patterns that individual fixes would hide.

Why It Matters Here (In the Capstone)

In a solo or small-team capstone, triage is often skipped. The cost is invisible until a bad week arrives: three defects sit open, each half-fixed, with no record of which is urgent. Without triage you cannot:

  • decide what to fix first;
  • notice patterns ("three bugs this week are all schema-drift");
  • write a defensible post-mortem for M04;
  • negotiate scope when the deadline tightens.

A disciplined triage log is one of the most leveraged artifacts in a capstone because it is also the raw input for your technical-debt ledger (Concept 15) and your self-review (Concept 14). Dan Luu's writing on the value of cheap classification in debugging is apt: classification does not fix the bug, but it concentrates your attention where the cost of being wrong is highest.

Concrete Example(s) -- from a real capstone

A real triage entry for the task-manager capstone:

id:          BUG-2026-04-04-01
title: POST /tasks rejects valid unicode titles
reporter: author (from manual testing)
sev: Sev-2
impact: users cannot create tasks with emoji; estimated 10% of real titles;
no data loss; core feature only partially usable
root-cause: validation (pydantic regex rejects non-ASCII)
reproduction: curl -X POST /tasks -d '{"title":"ship rocket"}' -> 422
fix-path: update validator to use unicode-aware regex + add regression test
owner: me
status: open

Five such entries in a week let you see:

  • 3 of 5 are validation: fix your input-validation story as a class;
  • 1 is schema-drift: add a contract test at that boundary (Concept 9);
  • 1 is env-config: document env variables and add a startup check.

That pattern is invisible without the tags.

Common Confusion / Misconceptions

The most common confusion is treating severity and impact as the same thing. They are not. A Sev-1 is about how fast you must respond; impact describes what is affected. A single user hitting a bug can still be Sev-1 if the effect is data corruption.

A second confusion is choosing the root-cause tag from inside the fix. Triage names a suspected category based on the report. If it turns out to be something else during the fix, update the tag -- the change itself is useful data.

A third is over-engineering triage for a capstone. You do not need a ticketing system. A plain bugs.md or an issues board with a fixed set of labels is enough. What matters is that every bug has a row.

A fourth is severity inflation -- every bug filed as Sev-1 because "it feels urgent." Inflation kills the signal: when everything is Sev-1, nothing is. Define Sev-1 narrowly and defend the definition in weekly audits.

How To Use It (In Your Capstone)

Every time a defect is reported (by you or anyone):

  1. Create a triage entry within ten minutes. Do not start fixing.
  2. Fill in Sev, impact, and a suspected root-cause tag from a short, fixed list.
  3. Add reproduction steps, even if rough.
  4. Decide disposition: fix now, schedule for this week, schedule for later, decline.
  5. Link to the commit and regression test once the fix lands.
  6. Only then start implementation -- and start with a failing regression test (Concept 11).
  7. Every Friday, sweep the log for patterns and roll Sev-3/Sev-4 entries into the debt ledger (Concept 15) with triggers.

A simple Sev rubric for capstone use:

  • Sev-1: data loss, data corruption, security exposure, or the system is unusable. Stop what you are doing.
  • Sev-2: a primary feature is broken for a meaningful fraction of users or scenarios. Fix this sprint.
  • Sev-3: a non-primary feature is degraded, or a primary feature is degraded in edge cases. Schedule it.
  • Sev-4: cosmetic or minor ergonomic issues. Ledger it and consider deferring.

A Fixed Root-Cause Tag Set for the Capstone

Use a closed list. Free-form tags produce a long tail of unique labels that cannot be aggregated.

  • validation -- input parsing, bounds, types rejected at the boundary
  • schema-drift -- shape mismatch between code and database or external API
  • concurrency -- race, ordering, reentrancy, shared-state issue
  • external-api -- third-party provider error or behavior change
  • env-config -- env vars, secrets, feature flags, deploy target differences
  • ui-regression -- front-end behavior regressed against prior release
  • observability-gap -- bug was hard to diagnose because logging or tracing was absent
  • performance -- slow path, timeout, resource exhaustion
  • authorization -- cross-tenant, role, or permission error
  • data-integrity -- bad write, missing transactional boundary, orphaned rows

Ten tags is enough. Review the set quarterly; add or remove based on what the ledger shows.

Anti-Patterns to Recognize

  • Triage-by-commit. Opening a PR without a triage entry to link back to.
  • Unfixable without context. Triage entry that does not name reproduction steps.
  • Severity inflation. Every bug filed as Sev-1. Sev-1 becomes meaningless.
  • Open-ended disposition. "We will look at this" is not a disposition.

Weekly Triage Review

Once a week:

  • confirm every open entry still belongs (close stale or duplicate ones);
  • count entries by root-cause tag; if one tag dominates, invest in that class of defense (contract tests, startup checks, etc.);
  • check that every Sev-1 and Sev-2 either has a regression test in progress or a dated disposition;
  • roll aged Sev-3/Sev-4 entries into the debt ledger (Concept 15) with a trigger.

See also (integrative)

External references:

Check Yourself

  1. What three fields must every triage entry have before any fix begins?
  2. Why is choosing a root-cause tag at triage time valuable even if it might change?
  3. Why is the difference between severity and impact important?
  4. What signal does "three schema-drift entries in one week" give you, and what is the right next action?
  5. How would you detect severity inflation in your own triage log?

Mini Drill or Application (Capstone-scoped)

  1. Pick three defects in your capstone (invent them if you have none yet) and write triage entries for each using the sample format.
  2. For each entry, pick a Sev using the rubric, describe impact in one sentence, and choose a root-cause tag from the fixed list.
  3. Decide disposition for each: fix now / schedule / decline. Write the rationale in one sentence.
  4. Commit bugs.md to the repo and link every subsequent fix commit back to the triage entry.
  5. At the end of the week, count entries by tag and pick one "dominant tag" investment (contract test, validator rewrite, startup check) to make next sprint.

Source Backbone

Capstone implementation applies earlier code-quality, testing, and refactoring material. These books are the source backbone for that practice.