Semester 9 Project

Required Output Classification

Required output	Classification	Public/private guidance
Runnable project implementation and repository structure	`Portfolio candidate`	Polish the public repo only after tests pass, secrets are removed, and setup steps work from a clean checkout.
README with setup, inspection, verification instructions, and known limitations	`Portfolio candidate`	Make this public-facing if the project is safe to share; keep internal coursework notes in a private evidence folder.
Tests, traces, proofs, diagrams, benchmark outputs, or review notes required by the brief	`Checkpoint evidence`	Keep raw logs, benchmark runs, and reviewer comments private by default; publish summarized or reproducible versions when useful.
ADRs, design memos, runbooks, benchmark reports, and other high-effort engineering writeups	`Portfolio candidate`	These are worth polishing publicly when they tell a clear tradeoff story; otherwise keep them as private coursework evidence.
Final reflection, retrospective, or carry-forward notes	`Checkpoint evidence`	Keep candid self-assessment private unless rewritten as a concise public learning note.

Deploy a three-service web application on one supported Semester 9 track: a local-first production-shaped stack or a tightly bounded cloud sandbox. The cloud sandbox version targets AWS with Terraform, GitHub Actions, Amazon EKS, CloudWatch + Grafana, OIDC-based keyless deploys, a documented rollback runbook, and a written threat model and cost analysis; GCP (GKE) or Azure (AKS) are acceptable substitutes with equivalent services. The local-first version uses Docker Compose, kind/minikube/k3d, local Terraform validation, mock services, and local observability tooling to prove the same operating model without paid resources.

Objective

Take an existing (or freshly scaffolded) three-service API -- an HTTP gateway, a background worker, and a relational database -- and stand it up on the selected track in a way that another engineer could review, extend, and operate. No click-ops. No static cloud credentials in CI. No dashboards that do not map to a real user-visible health signal. The goal is not "it runs in the cloud" -- it is "I can defend every choice under a senior-engineer review against cost, failure, and security pressure."

System Under Deployment

api -- a small HTTP service (pick your language; Go, Python, or Node recommended) exposing a handful of endpoints with health and /metrics routes.
worker -- a background job consumer triggered by a queue (local fake queue, SQS, or equivalent), doing a non-trivial unit of work per message.
db -- a relational database (local Postgres for local-first; RDS Postgres, Cloud SQL, or Azure Database for PostgreSQL for cloud sandbox), with one read path the api uses and one write path the worker uses.

You are free to reuse a service from an earlier semester (for example, a slimmed-down OrderFlow payments path from S7). It must actually run; "hello world" inside a container does not exercise the infrastructure story enough.

Deliverables

Deployment track evidence -- for local-first, the three-service application runs in Docker Compose and kind/minikube/k3d with Kubernetes manifests, mock queue/storage services, local Postgres, and Terraform validation artifacts. For cloud sandbox, the three-service application runs on Amazon EKS (or GKE/AKS) in a multi-AZ VPC, with the api behind an L7 load balancer, the worker as a separate Kubernetes Deployment, and the managed database in a private subnet with no public endpoint. Everything is provisioned or validated via Terraform; no console-only resources.
CI/CD pipeline -- GitHub Actions workflow with build, test, scan, plan, and deploy stages matched to the selected track. Local-first CI must run container builds, unit/contract tests, secret scans, terraform fmt/terraform validate, static scans, local manifest validation, and a local deployment smoke test against Docker Compose or kind/minikube/k3d; it does not need cloud OIDC, terraform apply, or managed-cluster deploys. Cloud sandbox CI authenticates to the chosen provider via OIDC (no long-lived access keys), runs terraform plan on PRs, and runs terraform apply + image deploy on merge with a manual approval gate and documented rollback command.
Observability dashboard -- one local Grafana/Prometheus/Jaeger stack or Grafana (or CloudWatch) dashboard tied to two SLIs (request success rate, p95 latency), one SLO per service, at least three alerts with runbook links, and structured logs + distributed traces wired through OpenTelemetry so that one request ID ties log lines and spans together.
Security review -- written threat model using STRIDE for at least three components (ingress path, CI/CD pipeline, database), a least-privilege access review matched to the track, a secrets-handling note (how DB credentials flow without sitting in git or env), and a short supply-chain review (base image provenance, dependency scanning, image signing approach). Local-first access review covers Kubernetes RBAC, service accounts, local secret injection, network policies, and mock-service permissions; cloud sandbox access review covers IAM/workload identity policies for each service with no *:* on sensitive resources.
Cost analysis and teardown evidence -- for local-first, document why no paid resources are required and list any optional cloud costs avoided. For cloud sandbox, include a monthly cost estimate with the AWS Pricing Calculator (or equivalent), top three cost drivers named, one optimization experiment actually run (e.g., right-sized node group, switched NAT to single-AZ in dev, or moved logs to lifecycle-rule bucket), before/after numbers, screenshots or exports showing billing alerts, and a completed teardown checklist.

Cost Guardrails

These are hard gates for the project and checkpoint:

Budget ceiling: keep total cloud spend at or below $50 for the six-week semester project unless an instructor sets a lower ceiling. Local-first work should remain at $0 cloud spend.
Billing alert requirement: configure budget and billing alerts before any real cloud deployment; capture evidence in the repo.
Required teardown step: every cloud lab, deploy, and project milestone must end with a documented teardown or an explicit "kept alive until date because reason" note.
No long-lived paid resources: NAT gateways, managed clusters, databases, load balancers, public IPs, large log indexes, and unattached volumes must not survive overnight without owner, expiration date, and alert coverage.
Review evidence: PRs that add or change paid resources must include a cost-impact note, terraform plan summary, and rollback/teardown command.

Constraints & Rubric

Keep the stack realistic but cheap: budget ceiling ≤ $50 for the full 6 weeks, enforced by a cloud budget alarm before any real cloud deployment. Single provider (AWS, GCP, or Azure) chosen in week 84 and not swapped when using the cloud sandbox track; local-first learners instead choose a local runtime pair (Docker Compose plus kind/minikube/k3d) and do not need a provider account. "Production-like" means: IaC-managed or locally validated IaC, no static cloud credentials, no secrets in git, a real rollback path, and at least one working alert that would wake you up if the api dropped below its SLO. Cloud sandbox submissions also require multi-AZ design where feasible, no long-lived paid resources, and a required teardown step after every apply or deploy session.

Deliverable	Done when...	Self-score (1--5)
Deployment	Local-first: all three services run through Docker Compose and kind/minikube/k3d with reproducible commands and Terraform validation evidence. Cloud sandbox: all three services run on managed Kubernetes in a multi-AZ VPC, DB is private, and a clean-room `terraform apply` from an empty state recreates the environment with no manual clicks.
CI/CD	Local-first: PRs trigger tests, scans, Terraform validation, manifest validation, and a local smoke deploy with no cloud credentials. Cloud sandbox: PRs trigger plan + tests; merge to `main` deploys via OIDC with no stored cloud keys. Both tracks include a documented, tested rollback command (image tag pin or `kubectl rollout undo`).
Dashboard	Two SLIs, one SLO per service, at least three alerts each linking to a runbook entry; one request can be traced end-to-end from log line to span to metric.
Security review	STRIDE applied to ≥3 components; local-first justifies Kubernetes RBAC, service accounts, network policies, local secret injection, and mock-service permissions, while cloud sandbox justifies every IAM/workload-identity role and avoids wildcard actions on sensitive resources; secrets flow is documented from source to pod.
Cost analysis	Budget ceiling, billing-alert proof before cloud deploy, written monthly estimate or local-first cost-avoidance note, top three cost drivers or avoided drivers, one optimization experiment with before/after numbers, and teardown evidence.

Suggested Timeline (6 weeks)

Week	Focus	Work
84	Platform foundations	Choose local-first or cloud sandbox. Local-first: create Docker Compose skeleton and local Terraform backend. Cloud sandbox: create workload account/project, budget ceiling, billing alerts, least-privilege bootstrap role, and teardown checklist before bootstrapping remote state.
85	IaC baseline	Run `terraform fmt`, `terraform validate`, static scans, plan review, and mock resource diagrams. Local-first: model network/DB/queue boundaries without paid resources. Cloud sandbox: apply only after budget alerts exist and finish with teardown or an expiration note.
86	Kubernetes	Prove manifests and HPA behavior on kind/minikube/k3d first. Cloud sandbox may then provision EKS/GKE/AKS with the smallest viable node pool, short TTL, and teardown command captured.
87	Delivery pipeline	Local-first: GitHub Actions workflow runs tests, scans, Terraform validation, manifest validation, local image build, and Docker Compose or kind smoke deploy with documented rollback. Cloud sandbox: GitHub Actions workflow uses provider OIDC, `terraform plan` on PRs, `terraform apply` + image push + `kubectl set image` on merge, manual prod gate, documented rollback.
88	Security + observability	STRIDE threat model; local-first tightens Kubernetes RBAC/service accounts, network policies, local secret injection, and mock-service permissions; cloud sandbox tightens IAM/workload identity and rotates/eliminates any keys; wire OpenTelemetry locally through Collector + Prometheus/Grafana/Jaeger or logs; use paid observability only after budget alerts and retention limits are in place.
89	Polish, review, gate	Cost experiment or local-first cost-avoidance note with before/after; completed teardown checklist; README + architecture diagram + runbook index; load-test the `api` against its SLO; checkpoint gate and semester exam

Cross-Track Integration

Testing (L5): unit tests in CI, terraform validate and tflint/Trivy-style static scans on PRs, a smoke test against Docker Compose/kind for local-first or a non-prod environment after every cloud apply, and one contract test between api and worker (schema of the queue message) that fails the pipeline if the shape changes.
Git (L5): trunk-based workflow with short-lived branches, required reviewers on infra/ changes, required status checks (CI green plus local Terraform validation output for local-first or terraform plan posted as a PR comment for cloud sandbox), and a CODEOWNERS file so platform changes cannot merge without a second look.
Security (L5): Local-first: no cloud credentials are required; document CI permissions, Kubernetes RBAC/service accounts, network policies, local secret injection, TLS where feasible, and mock-service access boundaries. Cloud sandbox: use OIDC federation for CI (no static cloud keys), per-service IAM/workload-identity roles with scoped policies, secrets in the provider secret manager or parameter store (never in git), encryption at rest on DB and object stores, TLS everywhere including inside the cluster where feasible, and documented least-privilege review for every role.
Observability (L4): metrics (local Prometheus/Grafana or CloudWatch + Prometheus), structured logs with request IDs, traces via OpenTelemetry; one Grafana dashboard per service mapped to its SLIs; alerts wired to a notification channel and each alert linked to a runbook entry describing the first three diagnostic steps.

Submission Checklist

Production-Style Project Brief

Use this project as a reviewable engineering brief, not only a completion exercise.

Problem statement

Write a one-paragraph statement covering the user, the problem, the constraint, and the outcome this project is meant to produce.

Required evidence

working artifact or reproducible deliverable
README with setup, inspection, and verification instructions
tests, traces, proofs, diagrams, benchmark output, or review notes appropriate to the semester
decision log with at least three meaningful tradeoffs
known limitations section with explicit scope cuts

Review questions

What is the smallest vertical slice that proves the project works?
Which requirement is most likely to be misunderstood by a reviewer?
What did you deliberately not build, and why?
What evidence would convince someone else that the result is correct?

Done means

The project is done only when another technical reader can inspect the artifact, run or review the verification evidence, and understand the tradeoffs without a live explanation.

Weekly Project Milestones

Use these milestones to keep the project from becoming a last-week scramble.

Milestone	Focus	Evidence
Start	Scope the smallest useful slice	problem statement, non-goals, first task list
Early build	Produce a walking version	runnable skeleton, first test, first committed artifact
Middle	Add the hard part	implementation note, trace/proof/benchmark/design decision
Review	Stress the weak point	failure case, debugging note, peer/self review, correction commit
Finish	Package for inspection	README, verification instructions, known limitations, reflection

Answer-Quality Examples

Quality	What it sounds like
Weak	"I built it because the module asked for it."
Acceptable	"It works for the required examples and I can explain the main idea."
Strong	"Here is the tradeoff I chose, the evidence that supports it, and the case where it would fail."
Portfolio-ready	"A reviewer can inspect the artifact, rerun the checks, and understand why this solution fits this semester's goals."

Future Capstone Connection

Before closing this project, write two sentences on how it could help the final capstone: one reusable technical skill and one artifact habit to preserve.

Calibration Materials

Use these learner-visible calibration materials before self-grading or requesting review:

Required Output Classification​

Objective​

System Under Deployment​

Deliverables​

Cost Guardrails​

Constraints & Rubric​

Suggested Timeline (6 weeks)​

Cross-Track Integration​

Submission Checklist​

Production-Style Project Brief​

Problem statement​

Required evidence​

Review questions​

Done means​

Weekly Project Milestones​

Answer-Quality Examples​

Future Capstone Connection​

Before closing this project, write two sentences on how it could help the final capstone: one reusable technical skill and one artifact habit to preserve.​

Calibration Materials​