Skip to main content

The Plan/Apply Lifecycle, Drift Detection

What This Concept Is

The core Terraform workflow has three commands that you will type thousands of times.

  • terraform init -- downloads providers, initializes the backend, and sets up the .terraform/ cache. Run once per checkout and any time required_providers or backend config changes.
  • terraform plan -- refreshes state from the real cloud, diffs desired (config) against current (state), and prints the actions it would take. Read-only against the world.
  • terraform apply -- executes the plan. Can be handed a saved plan file (terraform plan -out=tfplan; terraform apply tfplan) for "review then apply" workflows.

There is also:

  • terraform refresh (now invoked via terraform apply -refresh-only) -- update state from the cloud without proposing any config-driven changes. Used specifically to reconcile drift.

Drift detection = running plan on unchanged config and seeing changes. Drift means reality moved, not code.

Why It Matters Here

Every engineer you want to work with can read a plan. This is the single most important reading skill in IaC. A plan uses five symbols:

  • + create
  • - destroy
  • ~ in-place update
  • -/+ destroy and recreate (replace)
  • <= read (data source)

A safe reviewer scans for - and -/+ first. A stateful resource (database, persistent volume) marked -/+ is almost always a bug.

Drift detection matters for the same reason: if you do not regularly plan against production, you do not know what your config claims is true versus what actually is.

Concrete Example

Sample plan output:

Terraform will perform the following actions:

# aws_s3_bucket.artifacts will be updated in-place
~ resource "aws_s3_bucket" "artifacts" {
id = "acme-artifacts-prod"
~ tags = {
~ "Owner" = "alice" -> "platform-team"
}
}

# aws_db_instance.primary must be replaced
-/+ resource "aws_db_instance" "primary" {
~ engine_version = "14.10" -> "15.4" # forces replacement
...
}

Plan: 0 to add, 1 to change, 1 to destroy, 1 to replace.

Two very different changes in one plan:

  • The S3 tag edit (~) is cheap and safe.
  • The RDS replacement (-/+) will destroy the database and create a new one. If it holds data you care about, do not apply this plan. The fix is a moved block, a maintenance window with a snapshot, or a provider feature that supports in-place version upgrades. See Cluster 4 Concept 11.

A bot that blindly merges PRs and auto-applies just because the pipeline was green will destroy that database. This is why plan review is a social practice, not just a technical one (see Cluster 4 Concept 10).

Drift in Practice

You inherit a prod stack. Run:

terraform plan

and see:

  # aws_security_group.api will be updated in-place
~ ingress = [
- { from_port = 22, to_port = 22, cidr_blocks = ["0.0.0.0/0"], ... },
{ from_port = 443, to_port = 443, cidr_blocks = ["0.0.0.0/0"], ... },
]

Nobody changed the config. Someone opened SSH to the world from the AWS console during a debugging session three weeks ago and forgot. Terraform will happily revert it on the next apply. Drift detection caught a security issue that code review could not.

Common Confusion / Misconception

"plan is the dry-run of apply." Roughly, but subtle differences exist: resources with unknown-until-apply values (some AWS ARNs, random IDs) can produce an apply that has a few decisions the plan could not fully resolve. Always review carefully, and for high-stakes changes save the plan with -out and apply that exact plan.

"I can skip plan if my change is small." This is the most common route to self-inflicted outages. terraform apply (without a plan file) will still generate a plan, but you will not review it carefully because you just want to ship. The -out/apply-plan-file workflow exists to force a review step.

"Drift is someone else's fault." Drift is a system signal. Detect it, document the source, and either update the config to match (accept the drift) or revert on the next apply (reject it). Either way it should never be ignored.

How To Use It

  1. Make terraform plan -out=tfplan the default in CI. Humans review the output; a later job runs terraform apply tfplan.
  2. Read every plan top to bottom. Look for -, -/+, and unexpected ~.
  3. Schedule periodic drift detection: terraform plan -refresh-only on a nightly cron against production. Alert on non-zero diffs.
  4. When a plan proposes replacing a stateful resource, stop. Use moved, import, or create_before_destroy lifecycle where appropriate (Cluster 4).

Check Yourself

  1. What is the difference between ~ and -/+ in a plan?
  2. Why is terraform apply with a saved plan file safer than terraform apply without one?
  3. You run plan on unchanged code and see changes. Name three possible explanations in decreasing likelihood.

Mini Drill or Application

Using any sandbox stack (even a single S3 bucket with tags):

  1. Apply the config. Confirm a second plan is "No changes."
  2. Go to the cloud console and modify a tag manually.
  3. Run terraform plan. Identify the drift in the output.
  4. Decide: revert via apply, or update the code to match and re-apply.

Write one paragraph explaining which path you chose and why -- this is the actual skill of drift handling.

See also (external)


Source Backbone

Infrastructure-as-code details are tool-specific, but these local books provide the operational backbone for shell, Git, and change discipline.