Semester 9 Project
Required Output Classification
| Required output | Classification | Public/private guidance |
|---|---|---|
| Runnable project implementation and repository structure | Portfolio candidate | Polish the public repo only after tests pass, secrets are removed, and setup steps work from a clean checkout. |
| README with setup, inspection, verification instructions, and known limitations | Portfolio candidate | Make this public-facing if the project is safe to share; keep internal coursework notes in a private evidence folder. |
| Tests, traces, proofs, diagrams, benchmark outputs, or review notes required by the brief | Checkpoint evidence | Keep raw logs, benchmark runs, and reviewer comments private by default; publish summarized or reproducible versions when useful. |
| ADRs, design memos, runbooks, benchmark reports, and other high-effort engineering writeups | Portfolio candidate | These are worth polishing publicly when they tell a clear tradeoff story; otherwise keep them as private coursework evidence. |
| Final reflection, retrospective, or carry-forward notes | Checkpoint evidence | Keep candid self-assessment private unless rewritten as a concise public learning note. |
Deploy a three-service web application on one supported Semester 9 track: a local-first production-shaped stack or a tightly bounded cloud sandbox. The cloud sandbox version targets AWS with Terraform, GitHub Actions, Amazon EKS, CloudWatch + Grafana, OIDC-based keyless deploys, a documented rollback runbook, and a written threat model and cost analysis; GCP (GKE) or Azure (AKS) are acceptable substitutes with equivalent services. The local-first version uses Docker Compose, kind/minikube/k3d, local Terraform validation, mock services, and local observability tooling to prove the same operating model without paid resources.
Objective
Take an existing (or freshly scaffolded) three-service API -- an HTTP gateway, a background worker, and a relational database -- and stand it up on the selected track in a way that another engineer could review, extend, and operate. No click-ops. No static cloud credentials in CI. No dashboards that do not map to a real user-visible health signal. The goal is not "it runs in the cloud" -- it is "I can defend every choice under a senior-engineer review against cost, failure, and security pressure."
System Under Deployment
api-- a small HTTP service (pick your language; Go, Python, or Node recommended) exposing a handful of endpoints with health and/metricsroutes.worker-- a background job consumer triggered by a queue (local fake queue, SQS, or equivalent), doing a non-trivial unit of work per message.db-- a relational database (local Postgres for local-first; RDS Postgres, Cloud SQL, or Azure Database for PostgreSQL for cloud sandbox), with one read path theapiuses and one write path theworkeruses.
You are free to reuse a service from an earlier semester (for example, a slimmed-down OrderFlow payments path from S7). It must actually run; "hello world" inside a container does not exercise the infrastructure story enough.
Deliverables
- Deployment track evidence -- for local-first, the three-service application runs in Docker Compose and kind/minikube/k3d with Kubernetes manifests, mock queue/storage services, local Postgres, and Terraform validation artifacts. For cloud sandbox, the three-service application runs on Amazon EKS (or GKE/AKS) in a multi-AZ VPC, with the
apibehind an L7 load balancer, theworkeras a separate KubernetesDeployment, and the managed database in a private subnet with no public endpoint. Everything is provisioned or validated via Terraform; no console-only resources. - CI/CD pipeline -- GitHub Actions workflow with build, test, scan, plan, and deploy stages matched to the selected track. Local-first CI must run container builds, unit/contract tests, secret scans,
terraform fmt/terraform validate, static scans, local manifest validation, and a local deployment smoke test against Docker Compose or kind/minikube/k3d; it does not need cloud OIDC,terraform apply, or managed-cluster deploys. Cloud sandbox CI authenticates to the chosen provider via OIDC (no long-lived access keys), runsterraform planon PRs, and runsterraform apply+ image deploy on merge with a manual approval gate and documented rollback command. - Observability dashboard -- one local Grafana/Prometheus/Jaeger stack or Grafana (or CloudWatch) dashboard tied to two SLIs (request success rate, p95 latency), one SLO per service, at least three alerts with runbook links, and structured logs + distributed traces wired through OpenTelemetry so that one request ID ties log lines and spans together.
- Security review -- written threat model using STRIDE for at least three components (ingress path, CI/CD pipeline, database), a least-privilege access review matched to the track, a secrets-handling note (how DB credentials flow without sitting in git or
env), and a short supply-chain review (base image provenance, dependency scanning, image signing approach). Local-first access review covers Kubernetes RBAC, service accounts, local secret injection, network policies, and mock-service permissions; cloud sandbox access review covers IAM/workload identity policies for each service with no*:*on sensitive resources. - Cost analysis and teardown evidence -- for local-first, document why no paid resources are required and list any optional cloud costs avoided. For cloud sandbox, include a monthly cost estimate with the AWS Pricing Calculator (or equivalent), top three cost drivers named, one optimization experiment actually run (e.g., right-sized node group, switched NAT to single-AZ in dev, or moved logs to lifecycle-rule bucket), before/after numbers, screenshots or exports showing billing alerts, and a completed teardown checklist.
Cost Guardrails
These are hard gates for the project and checkpoint:
- Budget ceiling: keep total cloud spend at or below $50 for the six-week semester project unless an instructor sets a lower ceiling. Local-first work should remain at $0 cloud spend.
- Billing alert requirement: configure budget and billing alerts before any real cloud deployment; capture evidence in the repo.
- Required teardown step: every cloud lab, deploy, and project milestone must end with a documented teardown or an explicit "kept alive until date because reason" note.
- No long-lived paid resources: NAT gateways, managed clusters, databases, load balancers, public IPs, large log indexes, and unattached volumes must not survive overnight without owner, expiration date, and alert coverage.
- Review evidence: PRs that add or change paid resources must include a cost-impact note,
terraform plansummary, and rollback/teardown command.
Constraints & Rubric
Keep the stack realistic but cheap: budget ceiling ≤ $50 for the full 6 weeks, enforced by a cloud budget alarm before any real cloud deployment. Single provider (AWS, GCP, or Azure) chosen in week 84 and not swapped when using the cloud sandbox track; local-first learners instead choose a local runtime pair (Docker Compose plus kind/minikube/k3d) and do not need a provider account. "Production-like" means: IaC-managed or locally validated IaC, no static cloud credentials, no secrets in git, a real rollback path, and at least one working alert that would wake you up if the api dropped below its SLO. Cloud sandbox submissions also require multi-AZ design where feasible, no long-lived paid resources, and a required teardown step after every apply or deploy session.
| Deliverable | Done when... | Self-score (1--5) |
|---|---|---|
| Deployment | Local-first: all three services run through Docker Compose and kind/minikube/k3d with reproducible commands and Terraform validation evidence. Cloud sandbox: all three services run on managed Kubernetes in a multi-AZ VPC, DB is private, and a clean-room terraform apply from an empty state recreates the environment with no manual clicks. | |
| CI/CD | Local-first: PRs trigger tests, scans, Terraform validation, manifest validation, and a local smoke deploy with no cloud credentials. Cloud sandbox: PRs trigger plan + tests; merge to main deploys via OIDC with no stored cloud keys. Both tracks include a documented, tested rollback command (image tag pin or kubectl rollout undo). | |
| Dashboard | Two SLIs, one SLO per service, at least three alerts each linking to a runbook entry; one request can be traced end-to-end from log line to span to metric. | |
| Security review | STRIDE applied to ≥3 components; local-first justifies Kubernetes RBAC, service accounts, network policies, local secret injection, and mock-service permissions, while cloud sandbox justifies every IAM/workload-identity role and avoids wildcard actions on sensitive resources; secrets flow is documented from source to pod. | |
| Cost analysis | Budget ceiling, billing-alert proof before cloud deploy, written monthly estimate or local-first cost-avoidance note, top three cost drivers or avoided drivers, one optimization experiment with before/after numbers, and teardown evidence. |
Suggested Timeline (6 weeks)
| Week | Focus | Work |
|---|---|---|
| 84 | Platform foundations | Choose local-first or cloud sandbox. Local-first: create Docker Compose skeleton and local Terraform backend. Cloud sandbox: create workload account/project, budget ceiling, billing alerts, least-privilege bootstrap role, and teardown checklist before bootstrapping remote state. |
| 85 | IaC baseline | Run terraform fmt, terraform validate, static scans, plan review, and mock resource diagrams. Local-first: model network/DB/queue boundaries without paid resources. Cloud sandbox: apply only after budget alerts exist and finish with teardown or an expiration note. |
| 86 | Kubernetes | Prove manifests and HPA behavior on kind/minikube/k3d first. Cloud sandbox may then provision EKS/GKE/AKS with the smallest viable node pool, short TTL, and teardown command captured. |
| 87 | Delivery pipeline | Local-first: GitHub Actions workflow runs tests, scans, Terraform validation, manifest validation, local image build, and Docker Compose or kind smoke deploy with documented rollback. Cloud sandbox: GitHub Actions workflow uses provider OIDC, terraform plan on PRs, terraform apply + image push + kubectl set image on merge, manual prod gate, documented rollback. |
| 88 | Security + observability | STRIDE threat model; local-first tightens Kubernetes RBAC/service accounts, network policies, local secret injection, and mock-service permissions; cloud sandbox tightens IAM/workload identity and rotates/eliminates any keys; wire OpenTelemetry locally through Collector + Prometheus/Grafana/Jaeger or logs; use paid observability only after budget alerts and retention limits are in place. |
| 89 | Polish, review, gate | Cost experiment or local-first cost-avoidance note with before/after; completed teardown checklist; README + architecture diagram + runbook index; load-test the api against its SLO; checkpoint gate and semester exam |
Cross-Track Integration
- Testing (L5): unit tests in CI,
terraform validateandtflint/Trivy-style static scans on PRs, a smoke test against Docker Compose/kind for local-first or a non-prod environment after every cloud apply, and one contract test betweenapiandworker(schema of the queue message) that fails the pipeline if the shape changes. - Git (L5): trunk-based workflow with short-lived branches, required reviewers on
infra/changes, required status checks (CI green plus local Terraform validation output for local-first orterraform planposted as a PR comment for cloud sandbox), and aCODEOWNERSfile so platform changes cannot merge without a second look. - Security (L5): Local-first: no cloud credentials are required; document CI permissions, Kubernetes RBAC/service accounts, network policies, local secret injection, TLS where feasible, and mock-service access boundaries. Cloud sandbox: use OIDC federation for CI (no static cloud keys), per-service IAM/workload-identity roles with scoped policies, secrets in the provider secret manager or parameter store (never in git), encryption at rest on DB and object stores, TLS everywhere including inside the cluster where feasible, and documented least-privilege review for every role.
- Observability (L4): metrics (local Prometheus/Grafana or CloudWatch + Prometheus), structured logs with request IDs, traces via OpenTelemetry; one Grafana dashboard per service mapped to its SLIs; alerts wired to a notification channel and each alert linked to a runbook entry describing the first three diagnostic steps.
Submission Checklist
-
README.mdat the repo root explaining the selected track, architecture, how to deploy, how to tear down, and linking to the runbook, threat model, and cost analysis. - Infrastructure or infrastructure model checked into version control. Local-first includes Terraform modules/configuration that pass
terraform fmt/terraform validateplus a saved plan or mock-resource diagram; cloud sandbox includes a cleanterraform plan/apply path from a fresh checkout. - No secrets, access keys, or
.tfvarswith credentials in git; verified with a pre-commit secret scanner. - Architecture diagram (Mermaid or image). Local-first shows Docker network, local Kubernetes cluster, services, local Postgres, mock queue/storage, and ingress path; cloud sandbox shows VPC, subnets, managed cluster, services, managed DB, and ingress path.
- GitHub Actions workflow file matched to the track. Local-first shows tests/scans/Terraform validation/local smoke deploy output; cloud sandbox shows OIDC-based provider auth and a PR run showing
planposted as a comment. - Documented rollback command, tested at least once against a deliberately-broken deploy.
- STRIDE threat model covering at least ingress, CI/CD, and database, with local RBAC/mock-service controls or cloud IAM/workload-identity controls according to the selected track.
- Grafana or CloudWatch dashboard screenshot plus JSON export in the repo.
- At least one alert fired on purpose and routed to a runbook entry; the incident narrative written up.
- Budget ceiling, billing-alert evidence before any cloud deployment, monthly cost estimate or local-first $0-cloud rationale, top three cost drivers/avoided drivers, one optimization experiment with before/after numbers, and a completed teardown checklist.
Production-Style Project Brief
Use this project as a reviewable engineering brief, not only a completion exercise.
Problem statement
Write a one-paragraph statement covering the user, the problem, the constraint, and the outcome this project is meant to produce.
Required evidence
- working artifact or reproducible deliverable
- README with setup, inspection, and verification instructions
- tests, traces, proofs, diagrams, benchmark output, or review notes appropriate to the semester
- decision log with at least three meaningful tradeoffs
- known limitations section with explicit scope cuts
Review questions
- What is the smallest vertical slice that proves the project works?
- Which requirement is most likely to be misunderstood by a reviewer?
- What did you deliberately not build, and why?
- What evidence would convince someone else that the result is correct?
Done means
The project is done only when another technical reader can inspect the artifact, run or review the verification evidence, and understand the tradeoffs without a live explanation.
Weekly Project Milestones
Use these milestones to keep the project from becoming a last-week scramble.
| Milestone | Focus | Evidence |
|---|---|---|
| Start | Scope the smallest useful slice | problem statement, non-goals, first task list |
| Early build | Produce a walking version | runnable skeleton, first test, first committed artifact |
| Middle | Add the hard part | implementation note, trace/proof/benchmark/design decision |
| Review | Stress the weak point | failure case, debugging note, peer/self review, correction commit |
| Finish | Package for inspection | README, verification instructions, known limitations, reflection |
Answer-Quality Examples
| Quality | What it sounds like |
|---|---|
| Weak | "I built it because the module asked for it." |
| Acceptable | "It works for the required examples and I can explain the main idea." |
| Strong | "Here is the tradeoff I chose, the evidence that supports it, and the case where it would fail." |
| Portfolio-ready | "A reviewer can inspect the artifact, rerun the checks, and understand why this solution fits this semester's goals." |
Future Capstone Connection
Before closing this project, write two sentences on how it could help the final capstone: one reusable technical skill and one artifact habit to preserve.
Calibration Materials
Use these learner-visible calibration materials before self-grading or requesting review: