Module 2: Infrastructure as Code

Primary text: HashiCorp's official Terraform documentation (developer.hashicorp.com/terraform) Selective support: Terraform: Up & Running (Brikman) for narrative framing, the local Pro Git chunks for PR/review culture, and per-provider docs for exact resource semantics.

This guide is the primary teacher. You do not need to read the full Terraform documentation tree, and you certainly do not need to finish Terraform: Up & Running before practicing. You do need to become operationally strong at writing, planning, applying, modularizing, reviewing, and refactoring real infrastructure code -- and you need to know where to read in the official docs when the guide runs out.

Scope of This Module

Infrastructure as Code (IaC) is not "a DevOps tool you install." It is an engineering discipline: describe the desired state of systems in a reviewable artifact, let a planner compute the diff, and apply it with team safety. This module teaches that discipline Terraform-first, because Terraform has the most mature mental model, but the habits transfer to OpenTofu, Pulumi, CDK, and CloudFormation.

What it covers in depth:

declarative infrastructure as a reasoning shift away from shell scripts
state as the ground truth and where it gets corrupted in practice
idempotency and convergence, and why IaC is not "just API calls"
the three Terraform building blocks: providers, resources, data sources
structuring a root module: inputs, outputs, locals, naming discipline
the plan/apply lifecycle and what drift detection actually means
writing a reusable module that you would inflict on a teammate
workspaces, environment layouts, and the monorepo vs polyrepo question
remote state, locking, and what a corrupted state file costs a team
plan review as PR culture -- the plan is the diff
refactoring with moved, import, and safe state manipulation
blast radius and safe-by-default patterns when a single apply can delete production
where IaC ends and configuration management (Ansible) begins
CDK and Pulumi as IaC in general-purpose languages
policy as code: OPA, Sentinel, tfsec / Trivy, and where each fits

What it deliberately does not try to finish here:

mastery of any specific cloud provider's resource catalog (that belongs in Module 1 and in hands-on practice)
full container orchestration IaC (Module 3: Kubernetes)
CI/CD pipeline construction (Module 4)
deep Sentinel or OPA policy authoring as a specialty

If you can copy-paste Terraform from a blog post but cannot read a plan output and predict what it will do, the module is not complete.

Before You Start (Diagnostic Prerequisites Assessment)

Time Limit: 20 minutes, closed-book. Format: Short answers plus one multi-choice. Score yourself honestly. Purpose: Confirm that Module 1 (Cloud Platform Fundamentals) and the Git/CI track from earlier semesters have stuck. This module assumes you have clicked in a cloud console at least once and have made a real pull request.

Mastery-Based Progression

This is a competency gate. If you cannot explain what a cloud resource is or how to review a pull request, finishing this module will leave you with cargo-culted Terraform and no judgment to go with it.

Diagnostic Questions

Q1. Declarative vs imperative. In one sentence, what does "declarative configuration" mean, and why is bash provision.sh not declarative even if it is in git?

Q2. Idempotency. You run the same provisioning command twice. What does "idempotent" require about the second run? Give one example of a non-idempotent API call you have seen.

Q3. State intuition. Terraform does not call your cloud's APIs before every action to rediscover what exists. Where does it look first, and what goes wrong if that information is lost?

Q4. Module / reuse. In any language you know, what does it take to call something a "reusable" component? Name two properties.

Q5. Drift (multi-choice). An engineer clicks in the AWS console and changes a security group Terraform manages. The next terraform plan will most likely:

a) succeed silently and keep the console change
b) show a diff that proposes to revert the console change
c) error out because state is now invalid
d) rewrite state to match the console

Answer Key and Scoring

Declarative = describe the end state, not the steps. bash provision.sh runs commands; rerunning it can produce different results depending on the starting state.
Second run leaves the system in the same state as the first. Non-idempotent example: appending a line to a file, or any INSERT without a uniqueness constraint.
Terraform looks at its state file. Lose it and Terraform no longer knows which real resources it manages.
Expected answers include: clear inputs/outputs, no hidden dependencies on caller's environment, versioned, documented, composable.
b). plan compares desired state (config) against last known state and refreshes from the real world; it will propose reverting the manual change.

Scoring:

4-5 correct: Ready to proceed.
2-3 correct: Continue, but expect extra time in Cluster 1 and revisit Module 1 notes on cloud resources if Q3 or Q5 tripped you up.
0-1 correct: Pause. Return to Module 1 of this semester (Cloud Platform Fundamentals) and to the Git/PR material in Track B before starting here. IaC reasoning collapses without a cloud-resource mental model and PR discipline.

Automatic Remediation

If Q1 or Q2 missed: Read the intro at https://developer.hashicorp.com/terraform/intro and return.
If Q3 or Q5 missed: Preview Cluster 1, Concept 02 (state) before starting.
If Q4 missed: Revisit any prior module on modules/packages/functions. Reuse is a general software concept, not a Terraform feature.

What This Module Is For

Every production system you will ever touch is described, at some level, by infrastructure code or by the absence of it. Teams without IaC operate from memory and ClickOps; teams with IaC operate from a reviewable diff. The difference shows up the first time a region goes down at 3 a.m. and someone has to reproduce the stack in a fresh account.

This module builds the IaC reasoning needed for:

safely shipping production cloud changes (Module 4: CI/CD)
Kubernetes manifests and GitOps overlap (Module 3)
security posture review and least-privilege enforcement (Module 5)
architecture decision records that describe which infrastructure pattern was chosen and why (Semester 7, Module 5)
operating systems under pressure when the cloud console is not a good enough answer

You are learning to treat infrastructure like code and reviews like gates, not formalities.

Local Validation Path

Default to the local validation path before spending money or touching shared accounts:

Run terraform fmt -check and terraform validate on every root module.
Generate and review a saved plan (terraform plan -out=tfplan plus a human-readable terraform show -no-color tfplan > plan.txt) before any apply.
Run static scanning with tflint, Trivy, tfsec-compatible rules, Checkov, or OPA against configuration and plan.json fixtures.
Draw a mock resource diagram that explains the intended VPC/network, cluster, database, IAM, and data-flow boundaries even when the resources are not created.
Use mock providers, local backends, localstack-style services, and fake queues/databases when the learning goal is review, module design, or blast-radius reasoning rather than a provider-specific API.

Cloud applies are allowed only when they add learning value that local validation cannot provide, and they must inherit the Semester 9 budget, billing-alert, and teardown guardrails.

Concept Map

How To Use This Module

Work in order. The later clusters presume the earlier mental model is stable -- you cannot review a plan safely before you understand state, and you cannot refactor with moved before you understand modules.

Cluster 1: The IaC Mindset

Order	Concept	Type	Focus
1	Declarative vs Imperative Infrastructure	PRIMARY	Desired state vs scripts; why IaC is not "bash in git"
2	State: The Ground Truth and Its Hazards	PRIMARY	What state is, where it lives, and how it gets corrupted
3	Idempotency and Convergence in IaC	PRIMARY	Why running twice must be safe and why convergence is not the same as idempotency

Cluster mastery check: Can you explain in plain words what Terraform is doing when you type terraform apply, and where the state file fits in that picture?

Cluster 2: Terraform Core

Order	Concept	Type	Focus
4	Providers, Resources, Data Sources	PRIMARY	The three block types and when to use each
5	Variables, Outputs, Locals -- Structuring a Module	PRIMARY	Inputs, outputs, locals, naming, and file layout
6	The Plan/Apply Lifecycle, Drift Detection	PRIMARY	What `plan`, `apply`, and `refresh` actually do

Cluster mastery check: Can you write, from scratch, a 30-line Terraform root module that calls one provider, creates one resource, reads one data source, and exposes one output -- and can you explain each line?

Cluster 3: Modularity and Reuse

Order	Concept	Type	Focus
7	Writing a Reusable Terraform Module	PRIMARY	Module contract, versioning, opinionated defaults
8	Workspaces, Environments, and the Monorepo vs Polyrepo Question	PRIMARY	How teams lay out `dev`/`staging`/`prod` and why `terraform workspace` is usually not the answer
9	Remote State, Locking, and Team Safety	PRIMARY	Backends, locks, and what a corrupted state file costs

Cluster mastery check: Given an environment layout question ("one repo or three? one state or many?"), can you articulate tradeoffs in a paragraph and name at least two failure modes of each option?

Cluster 4: Managing Change

Order	Concept	Type	Focus
10	Plan Review as PR Culture	PRIMARY	The plan is the diff; reviewing plans like reviewing code
11	Refactoring: `moved` Blocks, `import`, State Manipulation	PRIMARY	Renaming, moving, and adopting resources without destroying them
12	Blast Radius and Safe-by-Default Patterns	PRIMARY	`prevent_destroy`, targeted applies, dependency isolation

Cluster mastery check: Given a teammate's terraform plan output, can you separate "benign drift" from "delete-production-by-accident" and gate the apply accordingly?

Cluster 5: IaC Beyond Terraform

Order	Concept	Type	Focus
13	Configuration Management: Ansible and the Line with IaC	SUPPORTING	Where provisioning ends and configuration begins
14	CDK / Pulumi: IaC in General-Purpose Languages	SUPPORTING	When code beats DSL and when it loses the thread
15	Policy as Code: OPA, Sentinel, tfsec	PRIMARY	Gating plans with policy, and what each tool is actually good at

Cluster mastery check: Given a new infrastructure scenario (say, "we need servers configured for a legacy app"), can you pick Terraform vs Ansible vs CDK vs plain bash and justify the choice?

Then work these practice pages in order:

Order	Practice path	Focus
1	First Terraform Module Lab	Write a tiny root module end-to-end
2	Modularity and State Workshop	Split into modules, add remote state and locking
3	Refactoring and Import Clinic	Rename with `moved`; adopt existing infra via `import`
4	IaC Katas	Four repeatable drills including a VPC-module spec and a public-S3-rejection policy

Use Module Quiz after the concept and practice path. Use Reference and Selective Reading and Learning Resources for targeted reinforcement only.

Learning Objectives

By the end of this module you should be able to:

Distinguish declarative infrastructure from imperative provisioning and explain why the distinction matters for change safety.
Explain what Terraform state is, where it is stored, how it is locked, and what happens when it is lost or corrupted.
Define idempotency and convergence and tell when an IaC workflow violates each.
Author a Terraform root module that uses providers, resources, data sources, variables, outputs, and locals with deliberate naming.
Read a terraform plan output and predict the resulting real-world change, including deletions and replacements.
Factor a working root module into a reusable module with a stable contract and publish it inside a monorepo or registry.
Choose an environment layout (workspaces vs directories vs repos) and defend the choice against one concrete failure mode of each alternative.
Configure a remote backend with state locking and describe the sequence of operations during a locked apply.
Refactor infrastructure safely using moved blocks and import blocks, and articulate when raw terraform state commands are appropriate.
Identify blast radius in a plan and apply safe-by-default patterns (prevent_destroy, targeted applies, environment isolation) before merging.
Decide between Terraform, Ansible, CDK/Pulumi, and plain scripts for a given scenario.
Write at least one policy-as-code rule (OPA/Rego, Sentinel, or tfsec/Trivy config) that blocks a specific unsafe plan.

Outputs

one working Terraform root module checked into git with README, variables, and outputs
one reusable child module (a VPC or equivalent) with semver-tagged releases
one remote-state configuration using S3+DynamoDB, GCS, or HCP Terraform, with locking proven
at least one moved block refactor that renames a resource without destroying it
at least one import block adopting an existing cloud resource into code
one reviewed PR (your own or a teammate's) with a terraform plan comment analyzed line-by-line
one OPA (Rego), Sentinel, or tfsec policy that rejects a realistic unsafe configuration (the public-S3 kata counts)
one mistake log naming at least 10 real errors such as state drift ignored, applied without reading plan, module with eight inputs and no defaults, hand-edited state file, forgot lock and two applies raced, imported with wrong address
one short memo comparing Terraform and one alternative (Pulumi, CDK, or Ansible) on a scenario you have actually worked on

Completion Standard

You have completed Module 2 when all of these are true:

you can type terraform init / plan / apply / destroy on a scratch stack without looking at notes
you can read a plan output and point at every +, ~, -, and -/+ with an explanation of what it means
you have at least one module you are willing to inflict on a teammate and defend as "small enough"
you have at least one moved refactor in your history
you treat clicking in the console as an exception that requires an import or a destroy-and-reapply, not as the default
you can explain what terraform plan does in terms a backend engineer will understand in 60 seconds

If you have terraform apply muscle memory but cannot explain why two concurrent applies are dangerous, the module is not complete.

Reading Policy

Concept pages are the main path.
Official HashiCorp docs are the selective support. Open one page per concept gap.
See also (external) on each concept page points to 1-2 curated canonical URLs. Those are escalation links, not a second curriculum.
Because this module is tool-heavy, docs-first discipline applies: prefer official docs over blog posts or Stack Overflow for exact behavior.

Suggested Weekly Flow

Day	Work
1	Concepts 1-3, install Terraform, create a sandbox cloud account if you have not already
2	Concepts 4-6, complete Practice 1 (First Terraform Module Lab)
3	Concepts 7-9, configure remote state for the lab
4	Practice 2 (Modularity and State Workshop), extract a reusable module
5	Concepts 10-12, review a real `terraform plan` with a teammate or from an open-source repo
6	Practice 3 (Refactoring and Import Clinic), do one `moved` and one `import`
7	Concepts 13-15, Practice 4 (Katas), quiz

Reference

If you need escalation into the official HashiCorp docs or the selective external stack, use Reference and Selective Reading and Learning Resources.

Rich Learning Pages

Model Artifact Calibration

For infrastructure review evidence, compare your plan notes to the Terraform/IaC change review model artifact.

Scope of This Module​

Before You Start (Diagnostic Prerequisites Assessment)​

Diagnostic Questions​

Answer Key and Scoring​

Automatic Remediation​

What This Module Is For​

Local Validation Path​

Concept Map​

How To Use This Module​

Cluster 1: The IaC Mindset​

Cluster 2: Terraform Core​

Cluster 3: Modularity and Reuse​

Cluster 4: Managing Change​

Cluster 5: IaC Beyond Terraform​

Learning Objectives​

Outputs​

Completion Standard​

Reading Policy​

Suggested Weekly Flow​

Reference​

Rich Learning Pages​

Model Artifact Calibration​