Module 1: Cloud Platform Fundamentals: Case Studies
These case studies turn cloud primitives into design judgment: responsibility boundaries, failure domains, managed-service tradeoffs, network shape, identity, and cost.
Case Study 1: Shared Responsibility Misread
Scenario: A team runs a web app on managed compute and assumes the provider handles all security. A public storage bucket and overbroad instance role expose customer exports.
Source anchor: AWS's Shared Responsibility Model, which explains security and compliance as shared between AWS and the customer.
Module concepts: shared responsibility, IAM, storage policy, managed service limits.
Wrong Approach
"Managed service means managed security."
Better Approach
Write responsibility per layer:
Provider:
facility, hardware, managed service control plane
Customer:
IAM policy, data classification, bucket policy, network exposure, app code
Tradeoff Table
| Choice | Gain | Cost |
|---|---|---|
| managed platform | less infrastructure ops | still own configuration and data |
| custom VMs | control | patching/hardening burden |
| broad IAM | fewer permission errors | breach blast radius |
| least privilege | reduced blast radius | policy design work |
Required Artifact
Create a shared-responsibility matrix for one workload.
Case Study 2: Single-AZ Architecture Called Highly Available
Scenario: A product deploys web servers, database, and NAT in one availability zone. The design doc says "high availability" because there are two app instances.
Source anchor: AWS Well-Architected Availability, which frames availability as a measurable resiliency objective.
Module concepts: region, availability zone, failure domain, multi-AZ, dependency blast radius.
Wrong Approach
Count instances, not failure domains.
Better Approach
Map failure domains:
Web tier:
at least two AZs
Database:
multi-AZ or explicit recovery objective
NAT/load balancer:
no single-zone choke point
Tradeoff Table
| Choice | Gain | Cost |
|---|---|---|
| single AZ | cheap/simple | AZ failure outage |
| multi-AZ app | survives compute/AZ loss | cross-AZ cost/complexity |
| multi-AZ database | stronger availability | cost and failover behavior |
| multi-region | regional resilience | much higher complexity |
Required Artifact
Draw a failure-domain diagram with RTO/RPO and what survives one AZ loss.
Case Study 3: Public Subnet By Accident
Scenario: A database is launched with a public IP because the default VPC made networking easy. Security groups restrict access today, but the exposure is unnecessary.
Source anchor: AWS VPC docs describe public and private subnets, route tables, internet gateways, and NAT gateways. See AWS VPC route tables.
Module concepts: VPC, subnet, route table, public IP, NAT, defense in depth.
Wrong Approach
"The security group blocks access, so public placement is fine."
Better Approach
Use network layers intentionally:
Public subnet:
load balancer / bastion only if needed
Private app subnet:
application tasks
Private data subnet:
database, no route to internet gateway
Tradeoff Table
| Choice | Gain | Cost |
|---|---|---|
| public DB | easy admin | unnecessary exposure |
| private DB | smaller attack surface | access path required |
| NAT egress | outbound updates | cost and dependency |
| VPC endpoints | private service access | endpoint setup |
Required Artifact
Create a subnet/route-table review with every public route justified.
Case Study 4: Serverless Billing Surprise
Scenario: A serverless image-processing function looks cheap at launch. A marketing campaign triggers millions of invocations, high memory use, and expensive data egress.
Source anchor: AWS Lambda pricing and cloud provider pricing pages make cost proportional to requests, duration, memory, and data transfer. See AWS Lambda pricing.
Module concepts: serverless, unit economics, egress, cost model, scaling.
Wrong Approach
"Serverless is cheaper."
Better Approach
Model cost per operation:
invocations/month:
average duration:
memory:
storage read/write:
egress:
retry rate:
Tradeoff Table
| Choice | Gain | Cost |
|---|---|---|
| serverless | scales to zero and fast start | cost spikes with volume/duration |
| containers | predictable baseline | pay for idle capacity |
| batch workers | throughput control | latency |
| CDN/cache | lower compute/egress | invalidation complexity |
Required Artifact
Write a monthly cost model and alert threshold for one cloud workload.
Case Study 5: IAM User In Production Automation
Scenario: A CI job deploys using a long-lived IAM user access key stored as a secret. The key leaks through logs.
Source anchor: AWS IAM docs recommend roles and temporary credentials for workloads. See AWS IAM roles.
Module concepts: IAM role, temporary credentials, workload identity, least privilege.
Wrong Approach
Use long-lived users because they are easy to paste into CI.
Better Approach
Use workload identity:
CI identity:
OIDC trust to cloud role
Role policy:
only deploy target resources
Controls:
short-lived credentials
environment scoping
audit logs
Tradeoff Table
| Choice | Gain | Cost |
|---|---|---|
| IAM user key | simple | secret leakage and rotation burden |
| role + temporary creds | safer | trust-policy setup |
| broad deploy role | fewer failures | high blast radius |
| scoped role per env | lower blast radius | more policy work |
Required Artifact
Write an IAM deployment-role policy review: principal, trust policy, allowed actions, denied actions, and audit trail.
Source Map
| Source | Use it for |
|---|---|
| AWS Shared Responsibility Model | provider/customer responsibility boundaries |
| AWS Well-Architected Availability | availability and resiliency objectives |
| AWS VPC route tables | public/private routing and subnet design |
| AWS Lambda pricing | serverless unit economics |
| AWS IAM roles | temporary credentials and workload roles |
Completion Standard
- At least three artifacts are completed.
- At least one artifact maps shared responsibility.
- At least one artifact maps failure domains.
- At least one artifact includes cost math.