Plan Review as PR Culture
What This Concept Is
"Review the plan, not just the code." In a healthy Terraform team, the artifact under review in a pull request is not only the .tf diff -- it is also the terraform plan output the CI pipeline produced for the PR branch against the target environment.
Concretely:
- A PR opens with
.tfchanges. - CI runs
terraform plan -out=tfplanagainst dev (and often staging) and posts the human-readable plan as a PR comment. - Reviewers read both the code diff and the plan output.
- Approval gates the merge. Merge triggers
terraform apply tfplan(or a follow-up job against prod) in the corresponding environment.
The plan is the contract between "what the PR says I wrote" and "what Terraform will actually do." Reviewing only the code is like reviewing only the source of a compiler pass and never looking at the compiled output.
Why It Matters Here
Most Terraform incidents that could have been caught at review were caught by nobody reading the plan. A .tf change that looks small (renamed a tag, changed a subnet CIDR) can produce a plan that replaces five stateful resources. The code review missed it; the plan would have shown it on line one.
Plan review also surfaces:
- drift ("wait, why is this plan also proposing a security group change I didn't write?")
- module-version bumps that did more than you expected
- typos in variable names that turned into null values at plan time
Concrete Example
A PR diff looks benign:
resource "aws_db_instance" "primary" {
engine = "postgres"
- engine_version = "14.10"
+ engine_version = "15.4"
instance_class = "db.t3.medium"
}
Reviewers who only read the code say "LGTM, engine upgrade, sounds fine." The plan that CI posted tells a different story:
# aws_db_instance.primary must be replaced
-/+ resource "aws_db_instance" "primary" {
~ engine_version = "14.10" -> "15.4" # forces replacement
id = "db-abc123" -> (known after apply)
- final_snapshot_identifier = null
...
}
Plan: 0 to add, 0 to change, 1 to destroy, 1 to replace.
A replace on the primary database is an outage. Either the code needs apply_immediately and allow_major_version_upgrade = true, or the upgrade has to be done out-of-band with snapshots and a maintenance window. Without plan review, this merges, pipeline applies, production goes down.
Anatomy of a Good Plan Review Comment
LGTM with one concern:
The plan shows `aws_db_instance.primary` as -/+ (destroy + recreate). That will
drop the production database. Engine major-version bumps typically require
`allow_major_version_upgrade = true` and `apply_immediately` set explicitly;
I'd prefer we do this change in a maintenance window with a snapshot first.
The tag updates on the S3 buckets (~) are fine, no data impact.
Can we split this PR: the tag changes merge today, the engine bump goes into
a follow-up PR that also documents the rollback plan?
What this comment is doing:
- specific reference to plan symbols (
-/+,~) - named the stateful resource and its impact
- offered a concrete path forward (split the PR)
- asked for a rollback plan, not just "be careful"
Review Checklist (Reusable)
For any IaC PR, run through:
- Plan output posted? If CI did not post a plan, block the PR until it does. No plan, no merge.
- Any
-or-/+? Investigate each one. Ask: "if this runs at 3 a.m., what breaks?" - Stateful resources? Databases, persistent volumes, DNS records. Replacements here are almost always bugs.
- Provider / module version bumps? Read the upstream changelog. Version bumps can silently rename attributes.
- Expected scope match actual scope? PR title says "rename a tag"; plan touches 30 resources -> red flag.
- Secrets / sensitive attributes? Make sure plan output in the CI comment is configured to redact sensitive values.
- Drift included? If the plan shows changes you didn't write, reconcile them before merging.
Common Confusion / Misconception
"The CI posted a green check mark, so the plan is fine." Green check means "Terraform produced a plan without error." It does not mean the plan is safe. Plans are always syntactically valid; the danger is semantic.
"Reviewers can just re-run plan locally." They can, but local plans sometimes see different drift than the CI runner because of IAM scope or regional config. Use the CI-posted plan as the canonical artifact.
"Plan review slows us down." It does, a little, on small changes. It saves hours the first time it catches a production-destroying plan. The tradeoff is obvious after one outage.
"Auto-apply on merge is fine for prod if tests pass." Only if a human approved the plan in the PR. Never auto-apply a plan that no human read for production. Most teams use a two-step: auto-apply for dev, manual-approval gate for prod.
How To Use It
- Make "plan posted as PR comment" a CI requirement for every IaC repo.
- Add a PR template checklist (or paste the 7 items above) that reviewers tick.
- For production, split the pipeline: PR-branch plan, merge-branch plan, manual approval, apply. Never compress these steps.
- Budget 10-20 minutes per plan review for non-trivial PRs. That is the cost of the discipline.
- Escalate stateful-resource replacements. These should have at least two reviewers, one of whom owns the data.
Check Yourself
- Why is the plan output, not the
.tfdiff, the authoritative description of what the PR will do? - You see
Plan: 0 to add, 3 to change, 0 to destroyon a PR titled "fix typo in comment." What do you do? - A PR auto-applied on merge and took down staging. At which step of the pipeline should the mistake have been caught?
Mini Drill or Application
Find a public Terraform repo (many are open-source: terraform-aws-modules/terraform-aws-vpc on GitHub, for example). Open a recent merged PR. Without looking at the description:
- Read the code diff. Predict what the plan should say.
- Find the plan in the CI logs or the PR comments.
- Compare your prediction against the actual plan. Where did you under- or over-estimate the scope?
Write one paragraph on what you missed. This is the core skill of plan review, and it gets better only with reps.
See also (external)
- Terraform CLI:
terraform plan---outflag and the apply-from-saved-plan workflow that underpins PR culture. - Terraform Best Practices: Code review and CI -- opinionated guidance on plan-in-PR and CI pipelines.
Source Backbone
Infrastructure-as-code details are tool-specific, but these local books provide the operational backbone for shell, Git, and change discipline.
- Pro Git - versioned infrastructure changes, branching, review, and rollback habits.
- Git from the Bottom Up - mental model for stateful change history.
- The Linux Command Line - shell and automation grounding for infrastructure work.