Terraform for the Capstone: State, Backend, Scope

What This Concept Is

For the capstone you need one Terraform root, with one remote state backend, for one clearly scoped slice of infrastructure. No elaborate multi-root stack, no cross-account federation, no handwritten import chains. The capstone rewards legibility over cleverness.

Three things must be explicit and survive a hostile reviewer reading your repo cold:

State: the file Terraform uses to map resources it manages to real cloud objects. If it is wrong, apply will try to recreate what already exists, or destroy what you thought you were creating.
Backend: where that state lives (S3 + DynamoDB, GCS, Azure Storage, Terraform Cloud/HCP Terraform). For a capstone, a free-tier bucket with locking is enough.
Scope: what the root does and does not manage. Anything outside the scope is documented as "manual" or "managed elsewhere" with a named owner.

CDK and Pulumi users: the same three ideas exist under different names (stack, synth output, bootstrap stack). The concept is provider-agnostic.

Why It Matters Here (In the Capstone)

Bad state corrupts faster than bad code. If two laptops -- or a laptop and a CI runner -- run terraform apply against the same workspace without a lock, the state file will lie about what exists in cloud, and your next apply will try to recreate or destroy things. A locking remote backend is not optional for anything you care about, and the fix after corruption is much slower than the 30 minutes of setup.

Scope matters because capstone Terraform drifts. Every resource you put inside the root, you must keep there; every resource you forgot is a future terraform destroy surprise ("I thought that database was in Terraform…") or a future terraform apply surprise ("why is my manually-configured DNS record gone?"). A written scope statement prevents both failure modes.

Concrete Example(s)

A minimal, self-documenting root for a Cloud Run + Cloud SQL capstone:

terraform {
  required_version = ">= 1.7.0"
  required_providers {
    google = { source = "hashicorp/google", version = "~> 5.0" }
  }
  backend "gcs" {
    bucket = "capstone-tfstate-prod"
    prefix = "root"
  }
}

provider "google" {
  project = var.project_id
  region  = var.region
}

module "network" { source = "./modules/network" }
module "db"      { source = "./modules/db" ; network_id = module.network.id }
module "api"     { source = "./modules/api" ; db_url = module.db.conn_url }

The AWS equivalent backend, for comparison:

terraform {
  backend "s3" {
    bucket         = "capstone-tfstate-prod"
    key            = "root/terraform.tfstate"
    region         = "us-east-1"
    dynamodb_table = "capstone-tfstate-locks"
    encrypt        = true
  }
}

Scope declaration in the repo's INFRASTRUCTURE.md:

Managed by Terraform: VPC, Cloud SQL, Cloud Run service, IAM role bindings for the CI deploy role, Secret Manager secret containers (not values). Not managed by Terraform: DNS (bought through registrar UI; documented in library/raw/dns.md), billing alerts (configured once in console; screenshot archived), secret values (rotated out of band in Secret Manager).

A one-shot bootstrap script that creates the backend before the first terraform init:

#!/usr/bin/env bash
set -euo pipefail
gsutil mb -p "$PROJECT" -l us-central1 "gs://capstone-tfstate-prod"
gsutil versioning set on "gs://capstone-tfstate-prod"

Keep this script in scripts/bootstrap-backend.sh; run it once per project, then commit your backend.tf.

Common Confusion / Misconceptions

"I should manage everything in Terraform from day one." No. Managing billing alerts or DNS zones in Terraform for a capstone usually costs more complexity than it buys safety. The rule: anything that would lose user data, expose secrets, or cost real money if recreated goes in Terraform. Everything else can be manual and documented.
"State is just a cache; I can delete and re-import if it breaks." Re-importing a real production database or IAM role is a careful, hours-long operation. State is the source of truth for what Terraform believes it owns; treat it accordingly.
"Local state is fine for a solo project." Until your laptop dies, or CI runs a second apply, or you want to deploy from two machines. Start with remote state with locking; the 30-minute setup is cheaper than the 3-hour recovery.
"The state file is not sensitive because it doesn't contain secrets." It often does -- any sensitive = true output, some provider attributes, and any module that receives a secret as input will persist it into state. Treat the state bucket itself as a secret store.

How To Use It (In Your Capstone)

Create the remote state bucket (with versioning and locking) before running terraform apply. Scriptable in 10 lines.
Initialize the backend explicitly in your repo's README -- terraform init -backend-config=backend-prod.hcl.
Document scope in INFRASTRUCTURE.md with two lists: managed and deliberately not-managed.
Never commit terraform.tfstate*; add to .gitignore and protect the backend with IAM.
Run terraform plan in CI on every PR; run terraform apply only from a protected main branch with OIDC credentials scoped to the deploy role.
Turn on object versioning in the backend bucket so a corrupted state can be rolled back by promoting the previous version.
Write a two-paragraph state-recovery runbook: what to do if the state bucket is deleted or corrupted.

State File Hygiene

The state file contains enough information to reconstruct every managed resource -- including, sometimes, secret values written by providers. Treat the state file itself as a secret:

encrypted at rest in the backend bucket (bucket-default CMEK or SSE-KMS)
access-controlled to the deploy role only (not to every developer)
versioned, so a bad apply can be rolled back by promoting the previous state version
never committed, never emailed, never pasted into chat

Recovering from a bad state is a completely separate runbook from recovering from a bad deploy. Write it now, not at 3 a.m.

Check Yourself

What is the exact path to your state file (bucket + prefix/key)?
What prevents two apply runs from clobbering state at once?
Which three resources are deliberately not in your Terraform scope, and where are they documented?
If the state bucket were deleted tomorrow, what is your recovery plan in one paragraph?
Is versioning enabled on the state bucket, and have you tested restoring a previous version?
Who (what IAM principal) can write to the state bucket, and is that role's trust condition scoped to your repo?

Mini Drill or Application (Capstone-scoped)

Bootstrap in 30 minutes. Create the bucket (+ locking), write backend.tf, run terraform init successfully, and commit an empty root main.tf. If terraform init fails, stop until it works.
Scope statement. Write the two-list INFRASTRUCTURE.md from your capstone. Read it back and check that every resource in your cloud console appears in exactly one list.
Recovery rehearsal (on a junk project). Intentionally delete a resource from cloud without Terraform's knowledge, then run terraform plan and confirm Terraform proposes to recreate it. Record what the output looked like -- this is your drift-recovery muscle memory.

Source Backbone

Capstone deployment applies cloud, delivery, and operations material. These books are the source backbone for the delivery decisions.

Building Secure and Reliable Systems - secure/reliable deployment posture.
GitHub Actions in Action - workflow automation support.
Pro Git - release history, tags, and branch discipline.
The Linux Command Line - shell and deployment automation support.

What This Concept Is​

Why It Matters Here (In the Capstone)​

Concrete Example(s)​

Common Confusion / Misconceptions​

How To Use It (In Your Capstone)​

State File Hygiene​

See also (integrative)​

Check Yourself​

Mini Drill or Application (Capstone-scoped)​

Source Backbone​