Skip to main content

Plan Review as PR Culture

What This Concept Is

"Review the plan, not just the code." In a healthy Terraform team, the artifact under review in a pull request is not only the .tf diff -- it is also the terraform plan output the CI pipeline produced for the PR branch against the target environment.

Concretely:

  1. A PR opens with .tf changes.
  2. CI runs terraform plan -out=tfplan against dev (and often staging) and posts the human-readable plan as a PR comment.
  3. Reviewers read both the code diff and the plan output.
  4. Approval gates the merge. Merge triggers terraform apply tfplan (or a follow-up job against prod) in the corresponding environment.

The plan is the contract between "what the PR says I wrote" and "what Terraform will actually do." Reviewing only the code is like reviewing only the source of a compiler pass and never looking at the compiled output.

Why It Matters Here

Most Terraform incidents that could have been caught at review were caught by nobody reading the plan. A .tf change that looks small (renamed a tag, changed a subnet CIDR) can produce a plan that replaces five stateful resources. The code review missed it; the plan would have shown it on line one.

Plan review also surfaces:

  • drift ("wait, why is this plan also proposing a security group change I didn't write?")
  • module-version bumps that did more than you expected
  • typos in variable names that turned into null values at plan time

Concrete Example

A PR diff looks benign:

 resource "aws_db_instance" "primary" {
engine = "postgres"
- engine_version = "14.10"
+ engine_version = "15.4"
instance_class = "db.t3.medium"
}

Reviewers who only read the code say "LGTM, engine upgrade, sounds fine." The plan that CI posted tells a different story:

  # aws_db_instance.primary must be replaced
-/+ resource "aws_db_instance" "primary" {
~ engine_version = "14.10" -> "15.4" # forces replacement
id = "db-abc123" -> (known after apply)
- final_snapshot_identifier = null
...
}

Plan: 0 to add, 0 to change, 1 to destroy, 1 to replace.

A replace on the primary database is an outage. Either the code needs apply_immediately and allow_major_version_upgrade = true, or the upgrade has to be done out-of-band with snapshots and a maintenance window. Without plan review, this merges, pipeline applies, production goes down.

Anatomy of a Good Plan Review Comment

LGTM with one concern:

The plan shows `aws_db_instance.primary` as -/+ (destroy + recreate). That will
drop the production database. Engine major-version bumps typically require
`allow_major_version_upgrade = true` and `apply_immediately` set explicitly;
I'd prefer we do this change in a maintenance window with a snapshot first.

The tag updates on the S3 buckets (~) are fine, no data impact.

Can we split this PR: the tag changes merge today, the engine bump goes into
a follow-up PR that also documents the rollback plan?

What this comment is doing:

  • specific reference to plan symbols (-/+, ~)
  • named the stateful resource and its impact
  • offered a concrete path forward (split the PR)
  • asked for a rollback plan, not just "be careful"

Review Checklist (Reusable)

For any IaC PR, run through:

  1. Plan output posted? If CI did not post a plan, block the PR until it does. No plan, no merge.
  2. Any - or -/+? Investigate each one. Ask: "if this runs at 3 a.m., what breaks?"
  3. Stateful resources? Databases, persistent volumes, DNS records. Replacements here are almost always bugs.
  4. Provider / module version bumps? Read the upstream changelog. Version bumps can silently rename attributes.
  5. Expected scope match actual scope? PR title says "rename a tag"; plan touches 30 resources -> red flag.
  6. Secrets / sensitive attributes? Make sure plan output in the CI comment is configured to redact sensitive values.
  7. Drift included? If the plan shows changes you didn't write, reconcile them before merging.

Common Confusion / Misconception

"The CI posted a green check mark, so the plan is fine." Green check means "Terraform produced a plan without error." It does not mean the plan is safe. Plans are always syntactically valid; the danger is semantic.

"Reviewers can just re-run plan locally." They can, but local plans sometimes see different drift than the CI runner because of IAM scope or regional config. Use the CI-posted plan as the canonical artifact.

"Plan review slows us down." It does, a little, on small changes. It saves hours the first time it catches a production-destroying plan. The tradeoff is obvious after one outage.

"Auto-apply on merge is fine for prod if tests pass." Only if a human approved the plan in the PR. Never auto-apply a plan that no human read for production. Most teams use a two-step: auto-apply for dev, manual-approval gate for prod.

How To Use It

  1. Make "plan posted as PR comment" a CI requirement for every IaC repo.
  2. Add a PR template checklist (or paste the 7 items above) that reviewers tick.
  3. For production, split the pipeline: PR-branch plan, merge-branch plan, manual approval, apply. Never compress these steps.
  4. Budget 10-20 minutes per plan review for non-trivial PRs. That is the cost of the discipline.
  5. Escalate stateful-resource replacements. These should have at least two reviewers, one of whom owns the data.

Check Yourself

  1. Why is the plan output, not the .tf diff, the authoritative description of what the PR will do?
  2. You see Plan: 0 to add, 3 to change, 0 to destroy on a PR titled "fix typo in comment." What do you do?
  3. A PR auto-applied on merge and took down staging. At which step of the pipeline should the mistake have been caught?

Mini Drill or Application

Find a public Terraform repo (many are open-source: terraform-aws-modules/terraform-aws-vpc on GitHub, for example). Open a recent merged PR. Without looking at the description:

  1. Read the code diff. Predict what the plan should say.
  2. Find the plan in the CI logs or the PR comments.
  3. Compare your prediction against the actual plan. Where did you under- or over-estimate the scope?

Write one paragraph on what you missed. This is the core skill of plan review, and it gets better only with reps.

See also (external)


Source Backbone

Infrastructure-as-code details are tool-specific, but these local books provide the operational backbone for shell, Git, and change discipline.