IaaS, PaaS, and Serverless: the Abstraction Ladder

What This Concept Is

"The cloud" is not one product. It is a ladder of service models that vary in how much of the stack you run and how much the provider runs. From bottom to top:

IaaS (Infrastructure as a Service) - you rent compute, storage, and networking primitives and build everything else. EC2, EBS, VPC; GCE, Persistent Disk, VPC; Azure VMs, Managed Disks, VNet.
CaaS (Containers as a Service) - you ship container images; the provider runs the control plane. ECS, Fargate, GKE, AKS, Cloud Run jobs, Azure Container Apps.
PaaS (Platform as a Service) - you push code; the provider builds, runs, scales, and patches the runtime. Elastic Beanstalk, App Engine (Standard), Azure App Service.
Serverless - you ship a function or a container that runs only when invoked; the provider autoscales from zero and bills per request. Lambda, Cloud Functions, Cloud Run services, Azure Functions.
SaaS - you use someone else's application through an API (Stripe, SendGrid, Auth0). Not built by you at all.

Each step up the ladder trades control for operational burden. You lose tuning knobs and gain the ability to ignore things (OS patches, autoscaling, capacity planning) that used to be your full-time job.

This is the same lens as the shared-responsibility model, seen from the workload side. Shared-responsibility tells you who owns what; the abstraction ladder tells you how much of the stack the workload runs on. They move together: climbing the ladder pushes your ownership boundary upward.

Why It Matters Here

Picking the wrong rung wastes money or slows you down:

running a cron job on a permanently-provisioned EC2 fleet is overkill; a Lambda on a schedule is ~free
running a high-throughput, latency-sensitive stateful service on a PaaS can be impossible because you cannot tune the runtime
running a batch ML workload on serverless can blow through timeout limits and become more expensive than an EC2 Spot fleet
running a website entirely on IaaS means you inherit OS patching, web-server config, and scaling every time

The ladder also governs who can touch what. A startup with three engineers should pick the highest rung that works; an enterprise with a platform team and compliance requirements may pick a lower rung deliberately (for audit reasons, for bring-your-own-runtime, for bring-your-own-key, or for cost predictability at scale).

Cluster 2 is the same ladder at concrete detail: concept 4 is IaaS, concept 5 is CaaS, concept 6 is serverless. If you cannot pick a rung here, every decision in Cluster 2 becomes speculation.

Concrete Example

Consider shipping a simple JSON API.

IaaS path (EC2):

you choose an AMI, launch 2 instances, attach an ALB, set up an autoscaling group, configure security groups, install Nginx and your runtime, set up log shipping, set up patching, set up backups
you decide the OS, kernel, TCP tuning, file-system, process supervisor
your bill is steady, your fixed cost is non-zero even at 0 traffic, your cold start is "none"

PaaS path (App Engine / Beanstalk):

you push the code; the platform builds, deploys, scales, and watches it
you set environment variables and a few runtime flags
your bill tracks with running instances; cold start is seconds when scaled from zero

Serverless path (Lambda behind API Gateway / Cloud Run):

you ship the code or container; each HTTP request triggers one execution
you do not see OS, instance, or scheduler
your bill is per-request plus per-millisecond; cold start is tens of milliseconds to a few seconds depending on runtime and package size
limits: 15 min max execution (Lambda), 32 MB request body, 10 GB ephemeral storage, concurrency caps

Same API, three very different operational models. The "right" choice depends on traffic shape (steady vs spiky), latency tolerance, team skill, and cost at scale.

Cross-provider parity table (for the same JSON API):

Rung	AWS	GCP	Azure
IaaS	EC2 + ALB + ASG	Compute Engine + MIG	Azure VMs + VMSS
CaaS	ECS / Fargate	Cloud Run / GKE Autopilot	Container Apps / AKS
PaaS	Elastic Beanstalk	App Engine Standard	App Service
Serverless	Lambda + API GW	Cloud Run services / Functions	Azure Functions

Common Confusion / Misconception

"Higher on the ladder is always cheaper." Not at scale. Serverless wins for spiky, low-utilization, request-driven traffic. For steady 24x7 CPU load, a reserved EC2 instance or a container on Fargate is often cheaper per unit of compute, because serverless pricing assumes you are paying for elasticity and cold starts. The crossover point is roughly "if the service runs continuously at >30-40% CPU, climb down one rung."

"Serverless means no operations." You still have a deploy pipeline, IAM roles, timeouts, retries, dead-letter queues, concurrency limits, observability, and cold-start mitigation. The ops work shifts, it does not vanish.

"PaaS and CaaS are the same thing." PaaS builds and runs your code from source; you hand it a repo. CaaS runs containers you built; you hand it an image. They differ in where the build boundary sits and how much the platform inspects your stack. PaaS implicitly ties you to the platform's language support matrix; CaaS is language-agnostic but forces you to own a Dockerfile.

"Higher rung = less vendor lock-in." Usually the reverse. IaaS is the most portable ("it's a Linux VM, move it"). Serverless is the most locked-in (Lambda event sources, triggers, and runtimes are AWS-specific; porting to Cloud Functions is a rewrite). Counter-examples exist (Cloud Run runs OCI containers and is relatively portable), but the heuristic holds.

Gotcha: The ladder is per-service, not per-system. A real system typically lives on several rungs at once: S3 (SaaS-ish), RDS (PaaS), a Lambda function (serverless), and a handful of EC2 instances (IaaS). Reviewing architectures rung-by-rung is more useful than forcing a single label on the whole thing.

How To Use It

For any new workload:

Describe the traffic shape: steady or spiky, QPS range, request latency, stateful or stateless.
Describe the team: how many engineers, how much ops experience, how much platform tolerance for undifferentiated heavy lifting.
Start at the top of the ladder (serverless) and climb down only when you hit a real constraint (timeout, runtime, cold start, pricing at steady scale, compliance).
Name the "forcing function" that would push you down one rung and write it in the ADR. If you cannot name one, you have not thought hard enough.
Write the choice and its rejected alternatives into a one-page decision record.
Re-read the decision in 6 months; the ladder often shifts as the workload grows. Migration up (to higher abstraction) is usually hard; plan for that.

Check Yourself

Why is "cheaper" a bad question at this level, and what is the better question?
Name one workload where serverless is obviously wrong and say why.
Why does moving up the ladder usually shrink the shared-responsibility customer surface?
A team insists on EC2 "for flexibility." What is the full list of costs they have just taken on that a PaaS would absorb?
A workload is on App Engine Standard and wants to run a native-binary image processor. What constraint forces the rung change, and to which rung?

Mini Drill or Application

Take four workloads: (a) a monthly billing job, (b) a websocket chat service, (c) an ML model serving at 2000 QPS, (d) a small internal CRUD app used by 30 people. For each, in fifteen minutes, pick a rung on the ladder and write a one-paragraph justification. Include the one constraint that would push you down to the next rung.

Extension: for workload (c), do the napkin math at 2000 QPS sustained over 30 days on Lambda vs Fargate vs a reserved EC2 fleet. Which is cheapest? By roughly what factor? What does that tell you about where serverless pricing stops making sense?

Read This Only If Stuck

AWS Overview: Types of cloud computing - the canonical IaaS/PaaS/SaaS definitions with AWS service examples
AWS Fargate: Serverless compute for containers - mid-ladder reference for CaaS
Google Cloud: Choose a compute option - decision tree across the same ladder on GCP
Azure compute service decision tree - Microsoft's rung-by-rung choice framework
The Twelve-Factor App - application patterns that make higher rungs of the ladder actually work
Linux Command Line: Shell scripts overview - the automation substrate you still need even on the highest ladder rungs

What This Concept Is​

Why It Matters Here​

Concrete Example​

Common Confusion / Misconception​

How To Use It​

Check Yourself​

Mini Drill or Application​

Read This Only If Stuck​