Module 2: Infrastructure as Code
Primary text: HashiCorp's official Terraform documentation (developer.hashicorp.com/terraform)
Selective support: Terraform: Up & Running (Brikman) for narrative framing, the local Pro Git chunks for PR/review culture, and per-provider docs for exact resource semantics.
This guide is the primary teacher. You do not need to read the full Terraform documentation tree, and you certainly do not need to finish Terraform: Up & Running before practicing. You do need to become operationally strong at writing, planning, applying, modularizing, reviewing, and refactoring real infrastructure code -- and you need to know where to read in the official docs when the guide runs out.
Scope of This Module
Infrastructure as Code (IaC) is not "a DevOps tool you install." It is an engineering discipline: describe the desired state of systems in a reviewable artifact, let a planner compute the diff, and apply it with team safety. This module teaches that discipline Terraform-first, because Terraform has the most mature mental model, but the habits transfer to OpenTofu, Pulumi, CDK, and CloudFormation.
What it covers in depth:
- declarative infrastructure as a reasoning shift away from shell scripts
- state as the ground truth and where it gets corrupted in practice
- idempotency and convergence, and why IaC is not "just API calls"
- the three Terraform building blocks: providers, resources, data sources
- structuring a root module: inputs, outputs, locals, naming discipline
- the
plan/applylifecycle and what drift detection actually means - writing a reusable module that you would inflict on a teammate
- workspaces, environment layouts, and the monorepo vs polyrepo question
- remote state, locking, and what a corrupted state file costs a team
- plan review as PR culture -- the plan is the diff
- refactoring with
moved,import, and safe state manipulation - blast radius and safe-by-default patterns when a single apply can delete production
- where IaC ends and configuration management (Ansible) begins
- CDK and Pulumi as IaC in general-purpose languages
- policy as code: OPA, Sentinel, tfsec / Trivy, and where each fits
What it deliberately does not try to finish here:
- mastery of any specific cloud provider's resource catalog (that belongs in Module 1 and in hands-on practice)
- full container orchestration IaC (Module 3: Kubernetes)
- CI/CD pipeline construction (Module 4)
- deep Sentinel or OPA policy authoring as a specialty
If you can copy-paste Terraform from a blog post but cannot read a plan output and predict what it will do, the module is not complete.
Before You Start (Diagnostic Prerequisites Assessment)
Time Limit: 20 minutes, closed-book. Format: Short answers plus one multi-choice. Score yourself honestly. Purpose: Confirm that Module 1 (Cloud Platform Fundamentals) and the Git/CI track from earlier semesters have stuck. This module assumes you have clicked in a cloud console at least once and have made a real pull request.
This is a competency gate. If you cannot explain what a cloud resource is or how to review a pull request, finishing this module will leave you with cargo-culted Terraform and no judgment to go with it.
Diagnostic Questions
Q1. Declarative vs imperative. In one sentence, what does "declarative configuration" mean, and why is bash provision.sh not declarative even if it is in git?
Q2. Idempotency. You run the same provisioning command twice. What does "idempotent" require about the second run? Give one example of a non-idempotent API call you have seen.
Q3. State intuition. Terraform does not call your cloud's APIs before every action to rediscover what exists. Where does it look first, and what goes wrong if that information is lost?
Q4. Module / reuse. In any language you know, what does it take to call something a "reusable" component? Name two properties.
Q5. Drift (multi-choice). An engineer clicks in the AWS console and changes a security group Terraform manages. The next terraform plan will most likely:
- a) succeed silently and keep the console change
- b) show a diff that proposes to revert the console change
- c) error out because state is now invalid
- d) rewrite state to match the console
Answer Key and Scoring
- Declarative = describe the end state, not the steps.
bash provision.shruns commands; rerunning it can produce different results depending on the starting state. - Second run leaves the system in the same state as the first. Non-idempotent example: appending a line to a file, or any
INSERTwithout a uniqueness constraint. - Terraform looks at its state file. Lose it and Terraform no longer knows which real resources it manages.
- Expected answers include: clear inputs/outputs, no hidden dependencies on caller's environment, versioned, documented, composable.
- b).
plancompares desired state (config) against last known state and refreshes from the real world; it will propose reverting the manual change.
Scoring:
- 4-5 correct: Ready to proceed.
- 2-3 correct: Continue, but expect extra time in Cluster 1 and revisit Module 1 notes on cloud resources if Q3 or Q5 tripped you up.
- 0-1 correct: Pause. Return to Module 1 of this semester (Cloud Platform Fundamentals) and to the Git/PR material in Track B before starting here. IaC reasoning collapses without a cloud-resource mental model and PR discipline.
Automatic Remediation
- If Q1 or Q2 missed: Read the intro at https://developer.hashicorp.com/terraform/intro and return.
- If Q3 or Q5 missed: Preview Cluster 1, Concept 02 (state) before starting.
- If Q4 missed: Revisit any prior module on modules/packages/functions. Reuse is a general software concept, not a Terraform feature.
What This Module Is For
Every production system you will ever touch is described, at some level, by infrastructure code or by the absence of it. Teams without IaC operate from memory and ClickOps; teams with IaC operate from a reviewable diff. The difference shows up the first time a region goes down at 3 a.m. and someone has to reproduce the stack in a fresh account.
This module builds the IaC reasoning needed for:
- safely shipping production cloud changes (Module 4: CI/CD)
- Kubernetes manifests and GitOps overlap (Module 3)
- security posture review and least-privilege enforcement (Module 5)
- architecture decision records that describe which infrastructure pattern was chosen and why (Semester 7, Module 5)
- operating systems under pressure when the cloud console is not a good enough answer
You are learning to treat infrastructure like code and reviews like gates, not formalities.
Local Validation Path
Default to the local validation path before spending money or touching shared accounts:
- Run
terraform fmt -checkandterraform validateon every root module. - Generate and review a saved plan (
terraform plan -out=tfplanplus a human-readableterraform show -no-color tfplan > plan.txt) before any apply. - Run static scanning with
tflint, Trivy, tfsec-compatible rules, Checkov, or OPA against configuration andplan.jsonfixtures. - Draw a mock resource diagram that explains the intended VPC/network, cluster, database, IAM, and data-flow boundaries even when the resources are not created.
- Use mock providers, local backends, localstack-style services, and fake queues/databases when the learning goal is review, module design, or blast-radius reasoning rather than a provider-specific API.
Cloud applies are allowed only when they add learning value that local validation cannot provide, and they must inherit the Semester 9 budget, billing-alert, and teardown guardrails.
Concept Map
How To Use This Module
Work in order. The later clusters presume the earlier mental model is stable -- you cannot review a plan safely before you understand state, and you cannot refactor with moved before you understand modules.
Cluster 1: The IaC Mindset
| Order | Concept | Type | Focus |
|---|---|---|---|
| 1 | Declarative vs Imperative Infrastructure | PRIMARY | Desired state vs scripts; why IaC is not "bash in git" |
| 2 | State: The Ground Truth and Its Hazards | PRIMARY | What state is, where it lives, and how it gets corrupted |
| 3 | Idempotency and Convergence in IaC | PRIMARY | Why running twice must be safe and why convergence is not the same as idempotency |
Cluster mastery check: Can you explain in plain words what Terraform is doing when you type terraform apply, and where the state file fits in that picture?
Cluster 2: Terraform Core
| Order | Concept | Type | Focus |
|---|---|---|---|
| 4 | Providers, Resources, Data Sources | PRIMARY | The three block types and when to use each |
| 5 | Variables, Outputs, Locals -- Structuring a Module | PRIMARY | Inputs, outputs, locals, naming, and file layout |
| 6 | The Plan/Apply Lifecycle, Drift Detection | PRIMARY | What plan, apply, and refresh actually do |
Cluster mastery check: Can you write, from scratch, a 30-line Terraform root module that calls one provider, creates one resource, reads one data source, and exposes one output -- and can you explain each line?
Cluster 3: Modularity and Reuse
| Order | Concept | Type | Focus |
|---|---|---|---|
| 7 | Writing a Reusable Terraform Module | PRIMARY | Module contract, versioning, opinionated defaults |
| 8 | Workspaces, Environments, and the Monorepo vs Polyrepo Question | PRIMARY | How teams lay out dev/staging/prod and why terraform workspace is usually not the answer |
| 9 | Remote State, Locking, and Team Safety | PRIMARY | Backends, locks, and what a corrupted state file costs |
Cluster mastery check: Given an environment layout question ("one repo or three? one state or many?"), can you articulate tradeoffs in a paragraph and name at least two failure modes of each option?
Cluster 4: Managing Change
| Order | Concept | Type | Focus |
|---|---|---|---|
| 10 | Plan Review as PR Culture | PRIMARY | The plan is the diff; reviewing plans like reviewing code |
| 11 | Refactoring: moved Blocks, import, State Manipulation | PRIMARY | Renaming, moving, and adopting resources without destroying them |
| 12 | Blast Radius and Safe-by-Default Patterns | PRIMARY | prevent_destroy, targeted applies, dependency isolation |
Cluster mastery check: Given a teammate's terraform plan output, can you separate "benign drift" from "delete-production-by-accident" and gate the apply accordingly?
Cluster 5: IaC Beyond Terraform
| Order | Concept | Type | Focus |
|---|---|---|---|
| 13 | Configuration Management: Ansible and the Line with IaC | SUPPORTING | Where provisioning ends and configuration begins |
| 14 | CDK / Pulumi: IaC in General-Purpose Languages | SUPPORTING | When code beats DSL and when it loses the thread |
| 15 | Policy as Code: OPA, Sentinel, tfsec | PRIMARY | Gating plans with policy, and what each tool is actually good at |
Cluster mastery check: Given a new infrastructure scenario (say, "we need servers configured for a legacy app"), can you pick Terraform vs Ansible vs CDK vs plain bash and justify the choice?
Then work these practice pages in order:
| Order | Practice path | Focus |
|---|---|---|
| 1 | First Terraform Module Lab | Write a tiny root module end-to-end |
| 2 | Modularity and State Workshop | Split into modules, add remote state and locking |
| 3 | Refactoring and Import Clinic | Rename with moved; adopt existing infra via import |
| 4 | IaC Katas | Four repeatable drills including a VPC-module spec and a public-S3-rejection policy |
Use Module Quiz after the concept and practice path. Use Reference and Selective Reading and Learning Resources for targeted reinforcement only.
Learning Objectives
By the end of this module you should be able to:
- Distinguish declarative infrastructure from imperative provisioning and explain why the distinction matters for change safety.
- Explain what Terraform state is, where it is stored, how it is locked, and what happens when it is lost or corrupted.
- Define idempotency and convergence and tell when an IaC workflow violates each.
- Author a Terraform root module that uses providers, resources, data sources, variables, outputs, and locals with deliberate naming.
- Read a
terraform planoutput and predict the resulting real-world change, including deletions and replacements. - Factor a working root module into a reusable module with a stable contract and publish it inside a monorepo or registry.
- Choose an environment layout (workspaces vs directories vs repos) and defend the choice against one concrete failure mode of each alternative.
- Configure a remote backend with state locking and describe the sequence of operations during a locked apply.
- Refactor infrastructure safely using
movedblocks andimportblocks, and articulate when rawterraform statecommands are appropriate. - Identify blast radius in a plan and apply safe-by-default patterns (
prevent_destroy, targeted applies, environment isolation) before merging. - Decide between Terraform, Ansible, CDK/Pulumi, and plain scripts for a given scenario.
- Write at least one policy-as-code rule (OPA/Rego, Sentinel, or tfsec/Trivy config) that blocks a specific unsafe plan.
Outputs
- one working Terraform root module checked into git with README, variables, and outputs
- one reusable child module (a VPC or equivalent) with semver-tagged releases
- one remote-state configuration using S3+DynamoDB, GCS, or HCP Terraform, with locking proven
- at least one
movedblock refactor that renames a resource without destroying it - at least one
importblock adopting an existing cloud resource into code - one reviewed PR (your own or a teammate's) with a
terraform plancomment analyzed line-by-line - one OPA (Rego), Sentinel, or tfsec policy that rejects a realistic unsafe configuration (the public-S3 kata counts)
- one mistake log naming at least 10 real errors such as
state drift ignored,applied without reading plan,module with eight inputs and no defaults,hand-edited state file,forgot lock and two applies raced,imported with wrong address - one short memo comparing Terraform and one alternative (Pulumi, CDK, or Ansible) on a scenario you have actually worked on
Completion Standard
You have completed Module 2 when all of these are true:
- you can type
terraform init / plan / apply / destroyon a scratch stack without looking at notes - you can read a plan output and point at every
+,~,-, and-/+with an explanation of what it means - you have at least one module you are willing to inflict on a teammate and defend as "small enough"
- you have at least one
movedrefactor in your history - you treat clicking in the console as an exception that requires an
importor a destroy-and-reapply, not as the default - you can explain what
terraform plandoes in terms a backend engineer will understand in 60 seconds
If you have terraform apply muscle memory but cannot explain why two concurrent applies are dangerous, the module is not complete.
Reading Policy
- Concept pages are the main path.
- Official HashiCorp docs are the selective support. Open one page per concept gap.
See also (external)on each concept page points to 1-2 curated canonical URLs. Those are escalation links, not a second curriculum.- Because this module is tool-heavy, docs-first discipline applies: prefer official docs over blog posts or Stack Overflow for exact behavior.
Suggested Weekly Flow
| Day | Work |
|---|---|
| 1 | Concepts 1-3, install Terraform, create a sandbox cloud account if you have not already |
| 2 | Concepts 4-6, complete Practice 1 (First Terraform Module Lab) |
| 3 | Concepts 7-9, configure remote state for the lab |
| 4 | Practice 2 (Modularity and State Workshop), extract a reusable module |
| 5 | Concepts 10-12, review a real terraform plan with a teammate or from an open-source repo |
| 6 | Practice 3 (Refactoring and Import Clinic), do one moved and one import |
| 7 | Concepts 13-15, Practice 4 (Katas), quiz |
Reference
If you need escalation into the official HashiCorp docs or the selective external stack, use Reference and Selective Reading and Learning Resources.
Rich Learning Pages
Worked Examples | Guided Labs | Case Studies | Mistake Clinic | Reading Guide | Capstone Thread
Model Artifact Calibration
For infrastructure review evidence, compare your plan notes to the Terraform/IaC change review model artifact.