Code Review Rubrics: Correctness, Clarity, Design, Tests

What This Concept Is

A code review is not "looking for things that feel wrong." It is walking a rubric across a change so that important categories are not missed and unimportant things do not dominate. A minimal, durable rubric has four dimensions:

Correctness -- does this code actually do the thing? Are edge cases handled? Is concurrency / error handling correct?
Clarity -- can the next reader understand it without a tour guide? Are names, structure, and comments honest?
Design -- does it fit the codebase's shape? Does it introduce or relieve smells? Does it respect boundaries?
Tests -- are there tests, at the right level, that pin down the new behavior and protect future refactors?

Google's engineering practices list a similar set (design, functionality, complexity, tests, naming, comments, style, docs) -- the four-bucket version above is the condensed shape you should carry in your head.

Why It Matters Here

Without a rubric, reviews drift toward the reviewer's pet concerns. One reviewer obsesses over naming, another over performance, another over "I would have used Strategy here." The author gets inconsistent signals, and real issues in other buckets go unchecked.

A rubric is also a humane tool. It tells the author what to expect, which prevents a review from feeling like an ambush. It also lets you triage: a PR with broken correctness should not receive fifteen nit comments about variable names.

Concrete Example

A 200-line PR adds a new InvoiceExporter. A rubric-driven review pass:

Bucket	Question	Example finding
Correctness	Does it handle the `null currency` case we saw last quarter?	"Line 42 will NPE when `currency` is null; reproduced by test I pasted below."
Clarity	Are the names and structure honest?	"`helperProcess2` does two things -- consider splitting into `formatRows` and `writeRows`."
Design	Does this belong in `InvoiceExporter` or in `Invoice`?	"Total-computing logic leaking into the exporter -- `Invoice.total()` would match the pattern used by `Order`."
Tests	Is the new behavior pinned down?	"No test for the multi-currency branch; please add one based on the fixture in `test_invoice_multicurrency`."

Four comments, one per bucket, each actionable.

Common Confusion / Misconception

"Review = nitpick." A review dominated by style nits is a failed review. Style should be automated with a formatter; what the rubric protects is correctness, clarity, design, and tests.

"Review = gatekeeping." Review is a design conversation between peers, not an exam. The author is responsible for the change; the reviewer is an additional pair of eyes, not the final authority on taste.

"Review = blame." It is easy to slide into "this is wrong" rather than "here is a concrete problem I see." The rubric keeps you on evidence, not judgment.

"One rubric for every PR." Adjust depth to risk. A one-line config change does not need a design-pass interrogation. A new subsystem absolutely does.

How To Use It

Keep a checklist visible while reviewing:

Correctness: I understand what this does; I mentally ran edge cases (empty, null, concurrency, failure).
Clarity: names, structure, and comments are honest; no "helper" / "manager" / "util" that hides a real responsibility.
Design: fits existing patterns; does not reintroduce known smells; boundaries are respected.
Tests: every new branch has a test; tests are at the right level (unit vs integration); tests would catch a regression, not just a typo.

Label each comment with its bucket and whether it is nit (optional polish) or substantive (must discuss before merge). Example:

[design, substantive] PaymentProcessor now throws RuntimeException on network errors -- this used to be PaymentError. Callers relied on the typed exception for retry. Either keep the typed exception or update the two call sites I linked.

Sample Review Artifact

A real rubric-driven pass on a 180-line PR adding InvoiceExporter:

Summary: Four comments total, one per bucket. No style nits -- those are handled by the formatter.

[correctness, substantive] exportRows iterates items but items can be null when the invoice has no lines (see the empty_invoice_test fixture). Suggested: default to an empty list at the top, or guard the loop. Happy to paste a repro if useful.

[clarity, nit] helperProcess2 does two things -- formatting and writing. Splitting into formatRows and writeRows would make the call order obvious without reading both bodies. Not a blocker.

[design, substantive] Total-computing logic now lives in InvoiceExporter, but Invoice.total() already exists and is used by Order. Pulling this back into Invoice keeps the "exporter only formats" boundary intact and avoids two sources of truth for totals.

[tests, substantive] New multi-currency branch has no test. The existing test_invoice_multicurrency fixture covers the shape you need -- adding one assertion there would pin the behavior. Without a test this will regress silently.

Notice what is missing: no "I would have done it this way," no style debate, no ambush. Four labelled, evidence-based comments, each of which the author can act on in minutes.

Check Yourself

Which bucket is most likely to be missed when a reviewer is in a hurry?
Why is it better to label a comment "nit" than to leave it unlabelled?
Give a PR size or risk where the rubric depth should decrease.
Why should style comments almost never appear in a human review?
A PR is well-tested and clearly written, but the design choice is "fine, not great." Substantive or nit?

Mini Drill or Application

Given a 150-line PR introducing a new report format, list one expected comment per bucket. Then list two cases where you would defer a bucket (and say why).

Then take a real PR from your own repo (or a public one) and re-read it with the rubric, one bucket at a time. Count how many of your instinctive comments collapse into nits and how many are genuinely substantive. Most reviewers discover that half their comments were style noise -- the formatter should have caught those, not you.

What This Concept Is​

Why It Matters Here​

Concrete Example​

Common Confusion / Misconception​

How To Use It​

Sample Review Artifact​

Check Yourself​

Mini Drill or Application​

Read This Only If Stuck​