Skip to main content

Module 1: Cloud Platform Fundamentals: Case Studies

These case studies turn cloud primitives into design judgment: responsibility boundaries, failure domains, managed-service tradeoffs, network shape, identity, and cost.


Case Study 1: Shared Responsibility Misread

Scenario: A team runs a web app on managed compute and assumes the provider handles all security. A public storage bucket and overbroad instance role expose customer exports.

Source anchor: AWS's Shared Responsibility Model, which explains security and compliance as shared between AWS and the customer.

Module concepts: shared responsibility, IAM, storage policy, managed service limits.

Wrong Approach

"Managed service means managed security."

Better Approach

Write responsibility per layer:

Provider:
facility, hardware, managed service control plane

Customer:
IAM policy, data classification, bucket policy, network exposure, app code

Tradeoff Table

ChoiceGainCost
managed platformless infrastructure opsstill own configuration and data
custom VMscontrolpatching/hardening burden
broad IAMfewer permission errorsbreach blast radius
least privilegereduced blast radiuspolicy design work

Required Artifact

Create a shared-responsibility matrix for one workload.


Case Study 2: Single-AZ Architecture Called Highly Available

Scenario: A product deploys web servers, database, and NAT in one availability zone. The design doc says "high availability" because there are two app instances.

Source anchor: AWS Well-Architected Availability, which frames availability as a measurable resiliency objective.

Module concepts: region, availability zone, failure domain, multi-AZ, dependency blast radius.

Wrong Approach

Count instances, not failure domains.

Better Approach

Map failure domains:

Web tier:
at least two AZs

Database:
multi-AZ or explicit recovery objective

NAT/load balancer:
no single-zone choke point

Tradeoff Table

ChoiceGainCost
single AZcheap/simpleAZ failure outage
multi-AZ appsurvives compute/AZ losscross-AZ cost/complexity
multi-AZ databasestronger availabilitycost and failover behavior
multi-regionregional resiliencemuch higher complexity

Required Artifact

Draw a failure-domain diagram with RTO/RPO and what survives one AZ loss.


Case Study 3: Public Subnet By Accident

Scenario: A database is launched with a public IP because the default VPC made networking easy. Security groups restrict access today, but the exposure is unnecessary.

Source anchor: AWS VPC docs describe public and private subnets, route tables, internet gateways, and NAT gateways. See AWS VPC route tables.

Module concepts: VPC, subnet, route table, public IP, NAT, defense in depth.

Wrong Approach

"The security group blocks access, so public placement is fine."

Better Approach

Use network layers intentionally:

Public subnet:
load balancer / bastion only if needed

Private app subnet:
application tasks

Private data subnet:
database, no route to internet gateway

Tradeoff Table

ChoiceGainCost
public DBeasy adminunnecessary exposure
private DBsmaller attack surfaceaccess path required
NAT egressoutbound updatescost and dependency
VPC endpointsprivate service accessendpoint setup

Required Artifact

Create a subnet/route-table review with every public route justified.


Case Study 4: Serverless Billing Surprise

Scenario: A serverless image-processing function looks cheap at launch. A marketing campaign triggers millions of invocations, high memory use, and expensive data egress.

Source anchor: AWS Lambda pricing and cloud provider pricing pages make cost proportional to requests, duration, memory, and data transfer. See AWS Lambda pricing.

Module concepts: serverless, unit economics, egress, cost model, scaling.

Wrong Approach

"Serverless is cheaper."

Better Approach

Model cost per operation:

invocations/month:
average duration:
memory:
storage read/write:
egress:
retry rate:

Tradeoff Table

ChoiceGainCost
serverlessscales to zero and fast startcost spikes with volume/duration
containerspredictable baselinepay for idle capacity
batch workersthroughput controllatency
CDN/cachelower compute/egressinvalidation complexity

Required Artifact

Write a monthly cost model and alert threshold for one cloud workload.


Case Study 5: IAM User In Production Automation

Scenario: A CI job deploys using a long-lived IAM user access key stored as a secret. The key leaks through logs.

Source anchor: AWS IAM docs recommend roles and temporary credentials for workloads. See AWS IAM roles.

Module concepts: IAM role, temporary credentials, workload identity, least privilege.

Wrong Approach

Use long-lived users because they are easy to paste into CI.

Better Approach

Use workload identity:

CI identity:
OIDC trust to cloud role

Role policy:
only deploy target resources

Controls:
short-lived credentials
environment scoping
audit logs

Tradeoff Table

ChoiceGainCost
IAM user keysimplesecret leakage and rotation burden
role + temporary credssafertrust-policy setup
broad deploy rolefewer failureshigh blast radius
scoped role per envlower blast radiusmore policy work

Required Artifact

Write an IAM deployment-role policy review: principal, trust policy, allowed actions, denied actions, and audit trail.


Source Map

SourceUse it for
AWS Shared Responsibility Modelprovider/customer responsibility boundaries
AWS Well-Architected Availabilityavailability and resiliency objectives
AWS VPC route tablespublic/private routing and subnet design
AWS Lambda pricingserverless unit economics
AWS IAM rolestemporary credentials and workload roles

Completion Standard

  • At least three artifacts are completed.
  • At least one artifact maps shared responsibility.
  • At least one artifact maps failure domains.
  • At least one artifact includes cost math.