Terraform for the Capstone: State, Backend, Scope
What This Concept Is
For the capstone you need one Terraform root, with one remote state backend, for one clearly scoped slice of infrastructure. No elaborate multi-root stack, no cross-account federation, no handwritten import chains. The capstone rewards legibility over cleverness.
Three things must be explicit and survive a hostile reviewer reading your repo cold:
- State: the file Terraform uses to map resources it manages to real cloud objects. If it is wrong,
applywill try to recreate what already exists, or destroy what you thought you were creating. - Backend: where that state lives (S3 + DynamoDB, GCS, Azure Storage, Terraform Cloud/HCP Terraform). For a capstone, a free-tier bucket with locking is enough.
- Scope: what the root does and does not manage. Anything outside the scope is documented as "manual" or "managed elsewhere" with a named owner.
CDK and Pulumi users: the same three ideas exist under different names (stack, synth output, bootstrap stack). The concept is provider-agnostic.
Why It Matters Here (In the Capstone)
Bad state corrupts faster than bad code. If two laptops -- or a laptop and a CI runner -- run terraform apply against the same workspace without a lock, the state file will lie about what exists in cloud, and your next apply will try to recreate or destroy things. A locking remote backend is not optional for anything you care about, and the fix after corruption is much slower than the 30 minutes of setup.
Scope matters because capstone Terraform drifts. Every resource you put inside the root, you must keep there; every resource you forgot is a future terraform destroy surprise ("I thought that database was in Terraform…") or a future terraform apply surprise ("why is my manually-configured DNS record gone?"). A written scope statement prevents both failure modes.
Concrete Example(s)
A minimal, self-documenting root for a Cloud Run + Cloud SQL capstone:
terraform {
required_version = ">= 1.7.0"
required_providers {
google = { source = "hashicorp/google", version = "~> 5.0" }
}
backend "gcs" {
bucket = "capstone-tfstate-prod"
prefix = "root"
}
}
provider "google" {
project = var.project_id
region = var.region
}
module "network" { source = "./modules/network" }
module "db" { source = "./modules/db" ; network_id = module.network.id }
module "api" { source = "./modules/api" ; db_url = module.db.conn_url }
The AWS equivalent backend, for comparison:
terraform {
backend "s3" {
bucket = "capstone-tfstate-prod"
key = "root/terraform.tfstate"
region = "us-east-1"
dynamodb_table = "capstone-tfstate-locks"
encrypt = true
}
}
Scope declaration in the repo's INFRASTRUCTURE.md:
Managed by Terraform: VPC, Cloud SQL, Cloud Run service, IAM role bindings for the CI deploy role, Secret Manager secret containers (not values). Not managed by Terraform: DNS (bought through registrar UI; documented in
library/raw/dns.md), billing alerts (configured once in console; screenshot archived), secret values (rotated out of band in Secret Manager).
A one-shot bootstrap script that creates the backend before the first terraform init:
#!/usr/bin/env bash
set -euo pipefail
gsutil mb -p "$PROJECT" -l us-central1 "gs://capstone-tfstate-prod"
gsutil versioning set on "gs://capstone-tfstate-prod"
Keep this script in scripts/bootstrap-backend.sh; run it once per project, then commit your backend.tf.
Common Confusion / Misconceptions
- "I should manage everything in Terraform from day one." No. Managing billing alerts or DNS zones in Terraform for a capstone usually costs more complexity than it buys safety. The rule: anything that would lose user data, expose secrets, or cost real money if recreated goes in Terraform. Everything else can be manual and documented.
- "State is just a cache; I can delete and re-import if it breaks." Re-importing a real production database or IAM role is a careful, hours-long operation. State is the source of truth for what Terraform believes it owns; treat it accordingly.
- "Local state is fine for a solo project." Until your laptop dies, or CI runs a second apply, or you want to deploy from two machines. Start with remote state with locking; the 30-minute setup is cheaper than the 3-hour recovery.
- "The state file is not sensitive because it doesn't contain secrets." It often does -- any
sensitive = trueoutput, some provider attributes, and any module that receives a secret as input will persist it into state. Treat the state bucket itself as a secret store.
How To Use It (In Your Capstone)
- Create the remote state bucket (with versioning and locking) before running
terraform apply. Scriptable in 10 lines. - Initialize the backend explicitly in your repo's README --
terraform init -backend-config=backend-prod.hcl. - Document scope in
INFRASTRUCTURE.mdwith two lists: managed and deliberately not-managed. - Never commit
terraform.tfstate*; add to.gitignoreand protect the backend with IAM. - Run
terraform planin CI on every PR; runterraform applyonly from a protectedmainbranch with OIDC credentials scoped to the deploy role. - Turn on object versioning in the backend bucket so a corrupted state can be rolled back by promoting the previous version.
- Write a two-paragraph state-recovery runbook: what to do if the state bucket is deleted or corrupted.
State File Hygiene
The state file contains enough information to reconstruct every managed resource -- including, sometimes, secret values written by providers. Treat the state file itself as a secret:
- encrypted at rest in the backend bucket (bucket-default CMEK or SSE-KMS)
- access-controlled to the deploy role only (not to every developer)
- versioned, so a bad
applycan be rolled back by promoting the previous state version - never committed, never emailed, never pasted into chat
Recovering from a bad state is a completely separate runbook from recovering from a bad deploy. Write it now, not at 3 a.m.
See also (integrative)
- S9 M02 Cluster 1: State -- ground truth and its hazards -- the general treatment; capstone is the minimal instance
- S9 M02 Cluster 3: Remote state, locking, team safety -- why "solo" still needs locking (your CI runner counts)
- S9 M02 Cluster 2: plan/apply lifecycle, drift detection -- why
planon PRs +applyon main is the default pattern - S9 M05 Cluster 1: Identity-centric security (least privilege) -- scoping the CI deploy role
- S9 M02 Cluster 4: Blast radius, safe-by-default patterns -- why one scoped root beats one monster root
- Terraform: State -- official definition of what state is and why it matters
- Terraform: Remote state -- supported backends and their locking semantics
- Terraform: Backend block configuration -- the
terraform { backend "..." {} }block and partial configuration - HashiCorp: State and locking best practices -- why S3+DynamoDB or GCS with native locking is the default
Check Yourself
- What is the exact path to your state file (bucket + prefix/key)?
- What prevents two
applyruns from clobbering state at once? - Which three resources are deliberately not in your Terraform scope, and where are they documented?
- If the state bucket were deleted tomorrow, what is your recovery plan in one paragraph?
- Is versioning enabled on the state bucket, and have you tested restoring a previous version?
- Who (what IAM principal) can write to the state bucket, and is that role's trust condition scoped to your repo?
Mini Drill or Application (Capstone-scoped)
- Bootstrap in 30 minutes. Create the bucket (+ locking), write
backend.tf, runterraform initsuccessfully, and commit an empty rootmain.tf. Ifterraform initfails, stop until it works. - Scope statement. Write the two-list
INFRASTRUCTURE.mdfrom your capstone. Read it back and check that every resource in your cloud console appears in exactly one list. - Recovery rehearsal (on a junk project). Intentionally delete a resource from cloud without Terraform's knowledge, then run
terraform planand confirm Terraform proposes to recreate it. Record what the output looked like -- this is your drift-recovery muscle memory.
Source Backbone
Capstone deployment applies cloud, delivery, and operations material. These books are the source backbone for the delivery decisions.
- Building Secure and Reliable Systems - secure/reliable deployment posture.
- GitHub Actions in Action - workflow automation support.
- Pro Git - release history, tags, and branch discipline.
- The Linux Command Line - shell and deployment automation support.