Module 2: Infrastructure as Code: Case Studies
These case studies treat IaC as a team safety system: state, plans, locks, modules, drift, policy, and refactoring.
Case Study 1: Local Terraform State Deletes Production
Scenario: Two engineers apply changes from different laptops. One has stale local state. A production load balancer is replaced unexpectedly.
Source anchor: Terraform Backends: State Storage and Locking, which explains backend state storage and locking.
Module concepts: state, remote backend, locking, plan/apply, team workflow.
Wrong Approach
Commit local state or keep it on one laptop.
Better Approach
Use remote state with locking:
backend:
remote object storage / Terraform Cloud
locking:
enabled where supported
workflow:
plan in CI
review plan
apply through controlled path
Tradeoff Table
| Choice | Gain | Cost |
|---|---|---|
| local state | quick solo start | unsafe team workflow |
| remote state | shared truth | backend operations |
| locking | prevents concurrent apply | stuck lock handling |
| CI apply | auditability | pipeline setup |
Required Artifact
Write a state backend design with lock behavior, access control, backups, and break-glass process.
Case Study 2: Refactor Without moved
Scenario: A VPC resource is moved into a module. Terraform plans to destroy and recreate it because the resource address changed.
Source anchor: Terraform's moved block reference, which supports changing resource addresses safely.
Module concepts: resource address, refactoring, moved block, plan review.
Wrong Approach
Rename files/modules and trust Terraform to know it is the same cloud resource.
Better Approach
Record the move:
moved {
from = aws_vpc.main
to = module.network.aws_vpc.main
}
Tradeoff Table
| Choice | Gain | Cost |
|---|---|---|
| rename without moved | easy edit | destroy/recreate risk |
| moved block | safe refactor | must know old/new address |
| state mv command | immediate | less reviewable |
| import | adopts existing resource | import mapping work |
Required Artifact
Write a refactor plan with old address, new address, moved blocks, expected plan diff, and rollback.
Case Study 3: Terraform Module That Hides Risk
Scenario: A reusable database module exposes only ameandsize`. It defaults to public access, no backups, and no deletion protection because "defaults keep it simple."
Source anchor: Terraform module documentation describes modules as containers for resources with inputs and outputs. See Terraform modules.
Module concepts: module interface, safe defaults, outputs, versioning, blast radius.
Wrong Approach
Make modules easy by hiding important decisions.
Better Approach
Expose risk-bearing choices:
variable "publicly_accessible" {
type = bool
default = false
}
variable "deletion_protection" {
type = bool
default = true
}
Tradeoff Table
| Choice | Gain | Cost |
|---|---|---|
| tiny interface | easy call site | hidden unsafe defaults |
| explicit inputs | clear decisions | more configuration |
| safe defaults | fewer incidents | exceptions require intent |
| opinionated module | consistency | less flexibility |
Required Artifact
Write a module contract: inputs, outputs, safe defaults, examples, and upgrade policy.
Case Study 4: Drift Ignored Until Incident
Scenario: An operator manually opens a security-group rule during an incident and forgets to revert it. Terraform code still looks secure.
Source anchor: Terraform plan detects differences between desired configuration and state/remote objects. See Terraform plan command.
Module concepts: drift, plan review, manual changes, reconciliation.
Wrong Approach
Assume git reflects production.
Better Approach
Run drift detection:
scheduled plan:
no intended changes
alert on diff
incident exception:
record manual change
create follow-up PR
Tradeoff Table
| Choice | Gain | Cost |
|---|---|---|
| manual console fixes | fast incident response | drift |
| scheduled plans | drift visibility | alert noise |
| forbid console access | control | slower emergencies |
| break-glass + audit | emergency path | governance work |
Required Artifact
Write a drift runbook: detection schedule, owners, allowed manual change process, and remediation SLA.
Case Study 5: Policy As Code Blocks Public Storage
Scenario: A PR creates an object bucket with public read access. Review misses it. Policy-as-code would have blocked it before apply.
Source anchor: Open Policy Agent provides policy as code for automated decisions. See OPA documentation.
Module concepts: policy as code, guardrails, plan scanning, compliance.
Wrong Approach
Rely only on human reviewers for repeatable safety rules.
Better Approach
Codify guardrails:
Rule:
storage buckets must not be public unless exception is approved
Inputs:
Terraform plan JSON
Outcome:
block PR/apply
Tradeoff Table
| Choice | Gain | Cost |
|---|---|---|
| human review | contextual | misses routine issues |
| static scanning | fast feedback | false positives |
| policy as code | enforceable | policy maintenance |
| exception process | pragmatic | governance burden |
Required Artifact
Write one policy rule in pseudocode plus exception workflow.
Source Map
| Source | Use it for |
|---|---|
| Terraform backends | remote state and locking |
| Terraform moved block | safe resource-address refactoring |
| Terraform modules | module interface design |
| Terraform plan | drift and change review |
| Open Policy Agent | policy as code |
Completion Standard
- At least three artifacts are completed.
- At least one artifact covers remote state and locking.
- At least one artifact includes a safe refactor.
- At least one artifact includes policy-as-code.