Skip to main content

Module 2: Infrastructure as Code: Case Studies

These case studies treat IaC as a team safety system: state, plans, locks, modules, drift, policy, and refactoring.


Case Study 1: Local Terraform State Deletes Production

Scenario: Two engineers apply changes from different laptops. One has stale local state. A production load balancer is replaced unexpectedly.

Source anchor: Terraform Backends: State Storage and Locking, which explains backend state storage and locking.

Module concepts: state, remote backend, locking, plan/apply, team workflow.

Wrong Approach

Commit local state or keep it on one laptop.

Better Approach

Use remote state with locking:

backend:
remote object storage / Terraform Cloud

locking:
enabled where supported

workflow:
plan in CI
review plan
apply through controlled path

Tradeoff Table

ChoiceGainCost
local statequick solo startunsafe team workflow
remote stateshared truthbackend operations
lockingprevents concurrent applystuck lock handling
CI applyauditabilitypipeline setup

Required Artifact

Write a state backend design with lock behavior, access control, backups, and break-glass process.


Case Study 2: Refactor Without moved

Scenario: A VPC resource is moved into a module. Terraform plans to destroy and recreate it because the resource address changed.

Source anchor: Terraform's moved block reference, which supports changing resource addresses safely.

Module concepts: resource address, refactoring, moved block, plan review.

Wrong Approach

Rename files/modules and trust Terraform to know it is the same cloud resource.

Better Approach

Record the move:

moved {
from = aws_vpc.main
to = module.network.aws_vpc.main
}

Tradeoff Table

ChoiceGainCost
rename without movedeasy editdestroy/recreate risk
moved blocksafe refactormust know old/new address
state mv commandimmediateless reviewable
importadopts existing resourceimport mapping work

Required Artifact

Write a refactor plan with old address, new address, moved blocks, expected plan diff, and rollback.


Case Study 3: Terraform Module That Hides Risk

Scenario: A reusable database module exposes only ameandsize`. It defaults to public access, no backups, and no deletion protection because "defaults keep it simple."

Source anchor: Terraform module documentation describes modules as containers for resources with inputs and outputs. See Terraform modules.

Module concepts: module interface, safe defaults, outputs, versioning, blast radius.

Wrong Approach

Make modules easy by hiding important decisions.

Better Approach

Expose risk-bearing choices:

variable "publicly_accessible" {
type = bool
default = false
}

variable "deletion_protection" {
type = bool
default = true
}

Tradeoff Table

ChoiceGainCost
tiny interfaceeasy call sitehidden unsafe defaults
explicit inputsclear decisionsmore configuration
safe defaultsfewer incidentsexceptions require intent
opinionated moduleconsistencyless flexibility

Required Artifact

Write a module contract: inputs, outputs, safe defaults, examples, and upgrade policy.


Case Study 4: Drift Ignored Until Incident

Scenario: An operator manually opens a security-group rule during an incident and forgets to revert it. Terraform code still looks secure.

Source anchor: Terraform plan detects differences between desired configuration and state/remote objects. See Terraform plan command.

Module concepts: drift, plan review, manual changes, reconciliation.

Wrong Approach

Assume git reflects production.

Better Approach

Run drift detection:

scheduled plan:
no intended changes
alert on diff

incident exception:
record manual change
create follow-up PR

Tradeoff Table

ChoiceGainCost
manual console fixesfast incident responsedrift
scheduled plansdrift visibilityalert noise
forbid console accesscontrolslower emergencies
break-glass + auditemergency pathgovernance work

Required Artifact

Write a drift runbook: detection schedule, owners, allowed manual change process, and remediation SLA.


Case Study 5: Policy As Code Blocks Public Storage

Scenario: A PR creates an object bucket with public read access. Review misses it. Policy-as-code would have blocked it before apply.

Source anchor: Open Policy Agent provides policy as code for automated decisions. See OPA documentation.

Module concepts: policy as code, guardrails, plan scanning, compliance.

Wrong Approach

Rely only on human reviewers for repeatable safety rules.

Better Approach

Codify guardrails:

Rule:
storage buckets must not be public unless exception is approved

Inputs:
Terraform plan JSON

Outcome:
block PR/apply

Tradeoff Table

ChoiceGainCost
human reviewcontextualmisses routine issues
static scanningfast feedbackfalse positives
policy as codeenforceablepolicy maintenance
exception processpragmaticgovernance burden

Required Artifact

Write one policy rule in pseudocode plus exception workflow.


Source Map

SourceUse it for
Terraform backendsremote state and locking
Terraform moved blocksafe resource-address refactoring
Terraform modulesmodule interface design
Terraform plandrift and change review
Open Policy Agentpolicy as code

Completion Standard

  • At least three artifacts are completed.
  • At least one artifact covers remote state and locking.
  • At least one artifact includes a safe refactor.
  • At least one artifact includes policy-as-code.