Skip to main content

Module 2: Infrastructure as Code

Primary text: HashiCorp's official Terraform documentation (developer.hashicorp.com/terraform) Selective support: Terraform: Up & Running (Brikman) for narrative framing, the local Pro Git chunks for PR/review culture, and per-provider docs for exact resource semantics.

This guide is the primary teacher. You do not need to read the full Terraform documentation tree, and you certainly do not need to finish Terraform: Up & Running before practicing. You do need to become operationally strong at writing, planning, applying, modularizing, reviewing, and refactoring real infrastructure code -- and you need to know where to read in the official docs when the guide runs out.


Scope of This Module

Infrastructure as Code (IaC) is not "a DevOps tool you install." It is an engineering discipline: describe the desired state of systems in a reviewable artifact, let a planner compute the diff, and apply it with team safety. This module teaches that discipline Terraform-first, because Terraform has the most mature mental model, but the habits transfer to OpenTofu, Pulumi, CDK, and CloudFormation.

What it covers in depth:

  • declarative infrastructure as a reasoning shift away from shell scripts
  • state as the ground truth and where it gets corrupted in practice
  • idempotency and convergence, and why IaC is not "just API calls"
  • the three Terraform building blocks: providers, resources, data sources
  • structuring a root module: inputs, outputs, locals, naming discipline
  • the plan/apply lifecycle and what drift detection actually means
  • writing a reusable module that you would inflict on a teammate
  • workspaces, environment layouts, and the monorepo vs polyrepo question
  • remote state, locking, and what a corrupted state file costs a team
  • plan review as PR culture -- the plan is the diff
  • refactoring with moved, import, and safe state manipulation
  • blast radius and safe-by-default patterns when a single apply can delete production
  • where IaC ends and configuration management (Ansible) begins
  • CDK and Pulumi as IaC in general-purpose languages
  • policy as code: OPA, Sentinel, tfsec / Trivy, and where each fits

What it deliberately does not try to finish here:

  • mastery of any specific cloud provider's resource catalog (that belongs in Module 1 and in hands-on practice)
  • full container orchestration IaC (Module 3: Kubernetes)
  • CI/CD pipeline construction (Module 4)
  • deep Sentinel or OPA policy authoring as a specialty

If you can copy-paste Terraform from a blog post but cannot read a plan output and predict what it will do, the module is not complete.


Before You Start (Diagnostic Prerequisites Assessment)

Time Limit: 20 minutes, closed-book. Format: Short answers plus one multi-choice. Score yourself honestly. Purpose: Confirm that Module 1 (Cloud Platform Fundamentals) and the Git/CI track from earlier semesters have stuck. This module assumes you have clicked in a cloud console at least once and have made a real pull request.

Mastery-Based Progression

This is a competency gate. If you cannot explain what a cloud resource is or how to review a pull request, finishing this module will leave you with cargo-culted Terraform and no judgment to go with it.

Diagnostic Questions

Q1. Declarative vs imperative. In one sentence, what does "declarative configuration" mean, and why is bash provision.sh not declarative even if it is in git?

Q2. Idempotency. You run the same provisioning command twice. What does "idempotent" require about the second run? Give one example of a non-idempotent API call you have seen.

Q3. State intuition. Terraform does not call your cloud's APIs before every action to rediscover what exists. Where does it look first, and what goes wrong if that information is lost?

Q4. Module / reuse. In any language you know, what does it take to call something a "reusable" component? Name two properties.

Q5. Drift (multi-choice). An engineer clicks in the AWS console and changes a security group Terraform manages. The next terraform plan will most likely:

  • a) succeed silently and keep the console change
  • b) show a diff that proposes to revert the console change
  • c) error out because state is now invalid
  • d) rewrite state to match the console

Answer Key and Scoring

  1. Declarative = describe the end state, not the steps. bash provision.sh runs commands; rerunning it can produce different results depending on the starting state.
  2. Second run leaves the system in the same state as the first. Non-idempotent example: appending a line to a file, or any INSERT without a uniqueness constraint.
  3. Terraform looks at its state file. Lose it and Terraform no longer knows which real resources it manages.
  4. Expected answers include: clear inputs/outputs, no hidden dependencies on caller's environment, versioned, documented, composable.
  5. b). plan compares desired state (config) against last known state and refreshes from the real world; it will propose reverting the manual change.

Scoring:

  • 4-5 correct: Ready to proceed.
  • 2-3 correct: Continue, but expect extra time in Cluster 1 and revisit Module 1 notes on cloud resources if Q3 or Q5 tripped you up.
  • 0-1 correct: Pause. Return to Module 1 of this semester (Cloud Platform Fundamentals) and to the Git/PR material in Track B before starting here. IaC reasoning collapses without a cloud-resource mental model and PR discipline.

Automatic Remediation

  • If Q1 or Q2 missed: Read the intro at https://developer.hashicorp.com/terraform/intro and return.
  • If Q3 or Q5 missed: Preview Cluster 1, Concept 02 (state) before starting.
  • If Q4 missed: Revisit any prior module on modules/packages/functions. Reuse is a general software concept, not a Terraform feature.

What This Module Is For

Every production system you will ever touch is described, at some level, by infrastructure code or by the absence of it. Teams without IaC operate from memory and ClickOps; teams with IaC operate from a reviewable diff. The difference shows up the first time a region goes down at 3 a.m. and someone has to reproduce the stack in a fresh account.

This module builds the IaC reasoning needed for:

  • safely shipping production cloud changes (Module 4: CI/CD)
  • Kubernetes manifests and GitOps overlap (Module 3)
  • security posture review and least-privilege enforcement (Module 5)
  • architecture decision records that describe which infrastructure pattern was chosen and why (Semester 7, Module 5)
  • operating systems under pressure when the cloud console is not a good enough answer

You are learning to treat infrastructure like code and reviews like gates, not formalities.

Local Validation Path

Default to the local validation path before spending money or touching shared accounts:

  1. Run terraform fmt -check and terraform validate on every root module.
  2. Generate and review a saved plan (terraform plan -out=tfplan plus a human-readable terraform show -no-color tfplan > plan.txt) before any apply.
  3. Run static scanning with tflint, Trivy, tfsec-compatible rules, Checkov, or OPA against configuration and plan.json fixtures.
  4. Draw a mock resource diagram that explains the intended VPC/network, cluster, database, IAM, and data-flow boundaries even when the resources are not created.
  5. Use mock providers, local backends, localstack-style services, and fake queues/databases when the learning goal is review, module design, or blast-radius reasoning rather than a provider-specific API.

Cloud applies are allowed only when they add learning value that local validation cannot provide, and they must inherit the Semester 9 budget, billing-alert, and teardown guardrails.


Concept Map


How To Use This Module

Work in order. The later clusters presume the earlier mental model is stable -- you cannot review a plan safely before you understand state, and you cannot refactor with moved before you understand modules.

Cluster 1: The IaC Mindset

OrderConceptTypeFocus
1Declarative vs Imperative InfrastructurePRIMARYDesired state vs scripts; why IaC is not "bash in git"
2State: The Ground Truth and Its HazardsPRIMARYWhat state is, where it lives, and how it gets corrupted
3Idempotency and Convergence in IaCPRIMARYWhy running twice must be safe and why convergence is not the same as idempotency

Cluster mastery check: Can you explain in plain words what Terraform is doing when you type terraform apply, and where the state file fits in that picture?

Cluster 2: Terraform Core

OrderConceptTypeFocus
4Providers, Resources, Data SourcesPRIMARYThe three block types and when to use each
5Variables, Outputs, Locals -- Structuring a ModulePRIMARYInputs, outputs, locals, naming, and file layout
6The Plan/Apply Lifecycle, Drift DetectionPRIMARYWhat plan, apply, and refresh actually do

Cluster mastery check: Can you write, from scratch, a 30-line Terraform root module that calls one provider, creates one resource, reads one data source, and exposes one output -- and can you explain each line?

Cluster 3: Modularity and Reuse

OrderConceptTypeFocus
7Writing a Reusable Terraform ModulePRIMARYModule contract, versioning, opinionated defaults
8Workspaces, Environments, and the Monorepo vs Polyrepo QuestionPRIMARYHow teams lay out dev/staging/prod and why terraform workspace is usually not the answer
9Remote State, Locking, and Team SafetyPRIMARYBackends, locks, and what a corrupted state file costs

Cluster mastery check: Given an environment layout question ("one repo or three? one state or many?"), can you articulate tradeoffs in a paragraph and name at least two failure modes of each option?

Cluster 4: Managing Change

OrderConceptTypeFocus
10Plan Review as PR CulturePRIMARYThe plan is the diff; reviewing plans like reviewing code
11Refactoring: moved Blocks, import, State ManipulationPRIMARYRenaming, moving, and adopting resources without destroying them
12Blast Radius and Safe-by-Default PatternsPRIMARYprevent_destroy, targeted applies, dependency isolation

Cluster mastery check: Given a teammate's terraform plan output, can you separate "benign drift" from "delete-production-by-accident" and gate the apply accordingly?

Cluster 5: IaC Beyond Terraform

OrderConceptTypeFocus
13Configuration Management: Ansible and the Line with IaCSUPPORTINGWhere provisioning ends and configuration begins
14CDK / Pulumi: IaC in General-Purpose LanguagesSUPPORTINGWhen code beats DSL and when it loses the thread
15Policy as Code: OPA, Sentinel, tfsecPRIMARYGating plans with policy, and what each tool is actually good at

Cluster mastery check: Given a new infrastructure scenario (say, "we need servers configured for a legacy app"), can you pick Terraform vs Ansible vs CDK vs plain bash and justify the choice?

Then work these practice pages in order:

OrderPractice pathFocus
1First Terraform Module LabWrite a tiny root module end-to-end
2Modularity and State WorkshopSplit into modules, add remote state and locking
3Refactoring and Import ClinicRename with moved; adopt existing infra via import
4IaC KatasFour repeatable drills including a VPC-module spec and a public-S3-rejection policy

Use Module Quiz after the concept and practice path. Use Reference and Selective Reading and Learning Resources for targeted reinforcement only.


Learning Objectives

By the end of this module you should be able to:

  1. Distinguish declarative infrastructure from imperative provisioning and explain why the distinction matters for change safety.
  2. Explain what Terraform state is, where it is stored, how it is locked, and what happens when it is lost or corrupted.
  3. Define idempotency and convergence and tell when an IaC workflow violates each.
  4. Author a Terraform root module that uses providers, resources, data sources, variables, outputs, and locals with deliberate naming.
  5. Read a terraform plan output and predict the resulting real-world change, including deletions and replacements.
  6. Factor a working root module into a reusable module with a stable contract and publish it inside a monorepo or registry.
  7. Choose an environment layout (workspaces vs directories vs repos) and defend the choice against one concrete failure mode of each alternative.
  8. Configure a remote backend with state locking and describe the sequence of operations during a locked apply.
  9. Refactor infrastructure safely using moved blocks and import blocks, and articulate when raw terraform state commands are appropriate.
  10. Identify blast radius in a plan and apply safe-by-default patterns (prevent_destroy, targeted applies, environment isolation) before merging.
  11. Decide between Terraform, Ansible, CDK/Pulumi, and plain scripts for a given scenario.
  12. Write at least one policy-as-code rule (OPA/Rego, Sentinel, or tfsec/Trivy config) that blocks a specific unsafe plan.

Outputs

  • one working Terraform root module checked into git with README, variables, and outputs
  • one reusable child module (a VPC or equivalent) with semver-tagged releases
  • one remote-state configuration using S3+DynamoDB, GCS, or HCP Terraform, with locking proven
  • at least one moved block refactor that renames a resource without destroying it
  • at least one import block adopting an existing cloud resource into code
  • one reviewed PR (your own or a teammate's) with a terraform plan comment analyzed line-by-line
  • one OPA (Rego), Sentinel, or tfsec policy that rejects a realistic unsafe configuration (the public-S3 kata counts)
  • one mistake log naming at least 10 real errors such as state drift ignored, applied without reading plan, module with eight inputs and no defaults, hand-edited state file, forgot lock and two applies raced, imported with wrong address
  • one short memo comparing Terraform and one alternative (Pulumi, CDK, or Ansible) on a scenario you have actually worked on

Completion Standard

You have completed Module 2 when all of these are true:

  • you can type terraform init / plan / apply / destroy on a scratch stack without looking at notes
  • you can read a plan output and point at every +, ~, -, and -/+ with an explanation of what it means
  • you have at least one module you are willing to inflict on a teammate and defend as "small enough"
  • you have at least one moved refactor in your history
  • you treat clicking in the console as an exception that requires an import or a destroy-and-reapply, not as the default
  • you can explain what terraform plan does in terms a backend engineer will understand in 60 seconds

If you have terraform apply muscle memory but cannot explain why two concurrent applies are dangerous, the module is not complete.


Reading Policy

  • Concept pages are the main path.
  • Official HashiCorp docs are the selective support. Open one page per concept gap.
  • See also (external) on each concept page points to 1-2 curated canonical URLs. Those are escalation links, not a second curriculum.
  • Because this module is tool-heavy, docs-first discipline applies: prefer official docs over blog posts or Stack Overflow for exact behavior.

Suggested Weekly Flow

DayWork
1Concepts 1-3, install Terraform, create a sandbox cloud account if you have not already
2Concepts 4-6, complete Practice 1 (First Terraform Module Lab)
3Concepts 7-9, configure remote state for the lab
4Practice 2 (Modularity and State Workshop), extract a reusable module
5Concepts 10-12, review a real terraform plan with a teammate or from an open-source repo
6Practice 3 (Refactoring and Import Clinic), do one moved and one import
7Concepts 13-15, Practice 4 (Katas), quiz

Reference

If you need escalation into the official HashiCorp docs or the selective external stack, use Reference and Selective Reading and Learning Resources.


Rich Learning Pages

Worked Examples | Guided Labs | Case Studies | Mistake Clinic | Reading Guide | Capstone Thread


Model Artifact Calibration

For infrastructure review evidence, compare your plan notes to the Terraform/IaC change review model artifact.