Skip to main content

Environment Strategy on a Budget

What This Concept Is

An environment is a full, running copy of the system, isolated from the others. The canonical trio is dev, staging, prod. The canonical fantasy is that a capstone needs all three at equal size, all the time.

For a solo-operated capstone on your own credit card, the real question is not "how many environments?" It is: "which environments do I pay to keep running continuously, and which are ephemeral?" Environments cost money and attention every month, forever. More environments means more Terraform complexity, more secrets, more deploy paths, more ways for reality to drift.

Three legitimate patterns for a capstone:

  • Full trio: dev on your laptop, staging always on (small), prod always on (real traffic or demo).
  • Collapsed + preview: only prod, with an ephemeral preview environment per pull request (auto-destroyed on close).
  • Two-env: staging and prod; no dev environment at all -- developers run locally with Docker Compose.

Each pattern has a defensible story. An indefensible pattern is "we have three envs but staging has not been deployed to in five weeks and dev is broken."

Why It Matters Here (In the Capstone)

For a capstone budget of ~$30-80/month, paying for three always-on envs can eat half the budget before you ship a single feature. Worse, the wrong number of environments creates a false sense of safety: a stale staging catches almost nothing, but you tell yourself it does, and when prod breaks you discover the lie at a bad time.

Environments also ripple into the pipeline. Each environment needs its own OIDC role, its own secret store, its own smoke target, and its own entry in the runbook. Adding an environment is not free; it is a recurring tax on every downstream concept in this module.

Concrete Example(s)

A real capstone budget sketch on Cloud Run + Cloud SQL:

PatterndevstagingprodPreviewMonthly
Full trio (always on)$15$15$35--~$65
Collapsed (preview + prod)local--$35~$5 avg~$40
Two-envlocal$15$35--~$50

A Terraform variables.tfvars per env, showing the knobs that legitimately differ:

# staging.tfvars
environment = "staging"
min_instances = 0
max_instances = 2
db_tier = "db-f1-micro"
deletion_protection = false

# prod.tfvars
environment = "prod"
min_instances = 1
max_instances = 10
db_tier = "db-g1-small"
deletion_protection = true

A CI drift-detection job that runs nightly and flags any non-empty plan:

drift-check:
schedule: ["0 3 * * *"] # nightly
steps:
- run: terraform init -backend-config=backend-prod.hcl
- run: terraform plan -var-file=prod.tfvars -detailed-exitcode
# exit 2 = drift; fail the job and open an issue

Common Confusion / Misconceptions

  • "Skipping staging is unprofessional." Skipping a redundant staging is professional. An unused staging drifts from prod and becomes a liability. A preview-per-PR environment catches more bugs than a stale staging, and costs less.
  • "Preview environments are only a big-team luxury." They are a solo-engineer luxury too -- as long as the platform destroys them cleanly on PR close. The danger is leaking previews; add a scheduled cleanup job as a belt-and-suspenders safety net.
  • "Same config in every env, different values." Seductive but wrong. Some knobs must differ (deletion_protection, min_instances, DB size). Some knobs must not (log format, app version schema). Be explicit about which is which in the Terraform inputs.
  • "Prod and staging use the same DB snapshot." A tempting shortcut that leaks real user data into a less-protected environment. Use synthetic or anonymized data in staging, or skip staging entirely.

How To Use It (In Your Capstone)

  1. Write your monthly cap in dollars before you choose a pattern.
  2. Estimate continuous cost of each env at your expected traffic using the vendor's calculator.
  3. Map each env to a unique reason it exists ("catch schema drift," "demo for reviewer," "production traffic"). If an env has no unique reason, cut it.
  4. Define the promotion rule in one sentence. Examples: "PR merged to main -> auto-deploy to staging; staging green for 30 minutes -> manual approval to prod" or "PR merged -> auto-deploy to prod; preview must be green on the PR first."
  5. Add a nightly terraform plan drift-detection job per env.
  6. Write the teardown rule: how is each environment destroyed, and when?
  7. Commit all of the above to library/raw/decisions/003-environments.md.

Environment Drift Is the Silent Killer

Environments drift when:

  • a manual change is made in the cloud console on one env but not applied to the others
  • a secret is rotated on prod but not on staging
  • a migration is run on prod ahead of a code deploy that would have applied it to staging first
  • a Terraform module is updated for one env and forgotten for the others

Drift invalidates the promise that "staging looks like prod." Catch it with periodic terraform plan runs against each env in CI -- even when you are not deploying -- and flag any non-empty plan as drift. The S9 M02 plan/apply lifecycle concept is the canonical reference; your capstone job is the minimal form.

Promotion Rules, in Writing

Every environment pair needs a written promotion rule. Examples:

  • "PR merged to main -> auto-deploy to staging; staging green for 30 minutes -> manual approval to prod."
  • "PR merged to main -> auto-deploy to prod; preview environment must be green on the PR first."
  • "Tagged release (v*.*.*) -> auto-deploy to prod; untagged merges stay on staging."

Pick one, write it into library/raw/decisions/003-environments.md, and configure the pipeline to enforce it. A promotion rule that lives only in a human's head is a promotion rule that gets skipped at midnight.

Sleep-At-Night Test

For each environment, answer: "if I got paged at 2 a.m. that this env is broken, do I care?" If the answer is "no," the env is probably low-value and a candidate for deletion. If the answer is "yes," the env deserves real alerts and a runbook entry -- and budget for paging.

See also (integrative)

Check Yourself

  1. Which environment does main deploy to automatically, and what is the exact trigger?
  2. Which environment can you destroy without losing any user data?
  3. What happens if prod breaks and staging has not been updated in six weeks?
  4. How would you detect drift today, and how long since you last did?
  5. Which env has a paging alert attached, and which does not?
  6. What is the written promotion rule from your lowest env to your highest?

Mini Drill or Application (Capstone-scoped)

  1. 003-environments.md. In 15 minutes write: monthly cap, env table (name, always-on?, cost, unique reason), promotion rule, teardown rule.
  2. Drift-check CI. Add a scheduled nightly terraform plan job in your workflow; on a non-empty plan, open an issue automatically. Let it run for one night and review the output.
  3. Cost tag audit. Add a env = "prod" / env = "staging" label or tag to every Terraform resource; check the cloud bill the next day and confirm costs split cleanly by env. Fix any untagged resources you find.

Source Backbone

Capstone deployment applies cloud, delivery, and operations material. These books are the source backbone for the delivery decisions.