IaC Katas
Four timed katas. Each should take 30-60 minutes. Do them spaced out, not in one sitting. They exercise design taste, not just syntax, and they loop through all five clusters.
Kata 1 -- Design a Reusable VPC Module Spec
Task. Without writing any HCL, write the spec for a reusable VPC module your team would actually use. Deliver:
- Inputs table: name, type, default, required, description. Include at least
name,cidr,azs,enable_nat,tags, and one opinionated toggle. - Outputs table:
vpc_id,public_subnet_ids,private_subnet_ids, and two others you defend. - Opinions section (bulleted). For each opinion, state the scenario it rules out. Example: "Always enables VPC flow logs to an S3 bucket. Rules out silent network debugging sessions that last days."
- Non-goals section (bulleted). What the module explicitly does not handle (e.g., peering, transit gateways, IPv6).
- Versioning rule. When is a change
major?minor?patch? Give an example of each.
Graded against Cluster 3 concepts. This is the hardest kata because it is almost entirely design.
Kata 2 -- Refactor a Monolith Root Module
Task. You inherit monolith.tf -- 600 lines, one root module, VPC + 3 services + RDS + S3 + CloudFront. Plan output is 400 lines long; nobody reads it.
Deliver a refactor plan, in writing:
- the module boundaries you would cut (at least three), each justified with "X rate of change vs Y rate of change" logic
- the exact
movedblocks you would write to accomplish the refactor without any destroy - the order of PRs (first split, second split, etc.) such that every intermediate state is applyable and reviewable
- what you would deliberately not touch in this refactor, and why
No HCL required; draw your boundaries with prose and pseudo-addresses (e.g., module.network, module.web_service, module.data). You are answering: what does this codebase look like after a calm, safe refactor?
Kata 3 -- Plan a Safe Rename with moved
Task. A resource currently addressed as aws_db_instance.primary must be renamed to aws_db_instance.platform_primary and also moved into module.data. It has prevent_destroy = true and is production.
Deliver:
- the exact
movedblock(s) you would write, with commentary on each - the expected plan output (describe it in words: "no change to the instance; two address updates in state")
- the PR description you would send, including the three things a reviewer should check
- the rollback procedure if
applyfails halfway - the reason you would not bundle any other change in this PR
Then, in a scratch environment, actually execute the plan on a stand-in resource (any renamed resource with prevent_destroy). Capture the before/after terraform state list.
Kata 4 -- Write an OPA Policy Rejecting Public S3 Buckets
Task. Write a Rego policy under policies/s3.rego that denies any Terraform plan containing an S3 bucket with a public ACL or with the public-access block disabled.
Deliver:
-
policies/s3.rego-- at minimum, rules that catch:aws_s3_bucket_acl.aclin{"public-read", "public-read-write", "authenticated-read"}aws_s3_bucket_public_access_blockwith any of the four flags set tofalse
-
a
policies/s3_test.regofile with positive and negative test cases, usingtest_prefixed rules -
a small
fixtures/directory with twoplan.jsonfiles: one that should pass, one that should fail -
a short
README.mdshowing the exact commands to run:opa test ./policies
opa eval -d policies/s3.rego -i fixtures/bad_plan.json "data.terraform.s3.deny" -
one paragraph in the README on what this policy cannot catch (static website hosting exceptions, CloudFront-fronted buckets, etc.) and how you would extend it.
Evidence Check
You have completed all four katas when you can:
- hand Kata 1's spec to a peer and have them build the module without a meeting
- make a monolith refactor (Kata 2) that a senior engineer nods through
- execute Kata 3's rename without a destroy and explain every line of the PR
- produce Kata 4's OPA policy and its tests -- both passing and failing -- from memory
- explain which kata each of Clusters 1-5 is primarily testing, and why