Skip to main content

Serverless Functions: Lambda, Cloud Functions, Cold Starts, Limits

What This Concept Is

A serverless function is a unit of code the platform runs in response to an event, autoscaling from zero to many concurrent executions and billing per-invocation plus per-millisecond of runtime.

The three big ones:

  • AWS Lambda - event sources include API Gateway, S3, SQS, SNS, DynamoDB streams, EventBridge, ALB, and custom invocations.
  • Google Cloud Functions - HTTP or event-driven; version 2 is built on Cloud Run under the hood, which explains its improved limits.
  • Azure Functions - similar envelope; supports HTTP, queues, timers, and event grids, with "Consumption," "Premium," and "Dedicated" plans that trade cold-start for cost.

The mental model: "I hand the platform a function and a trigger; the platform runs the function, once per event, in a warm sandbox it manages."

Key limits (AWS Lambda, typical of the category):

  • max duration: 15 minutes
  • max memory: 10 GB (CPU scales with memory - ~1 vCPU per ~1769 MB)
  • max request/response payload: 6 MB sync / 256 KB async
  • ephemeral /tmp storage: 10 GB max
  • deployment package: 50 MB zipped / 250 MB unzipped for direct upload; 10 GB as container image
  • account-level concurrency limit (default 1000) and per-function reserved concurrency

Cold starts:

  • a cold start is the first invocation of a new execution environment: container init, runtime boot, function init
  • typical cold start: 100-500 ms for Node.js/Python, 500 ms-2 s for Java/.NET, up to several seconds for large deployment packages or VPC-attached functions
  • subsequent invocations reuse the warm sandbox for minutes
  • provisioned concurrency and (on Lambda) SnapStart reduce cold starts for predictable traffic

Why It Matters Here

Serverless functions are the cheapest way to ship:

  • scheduled jobs and cron-like automation
  • event reactors (S3 object created -> process it; queue message -> handle it)
  • low-to-medium-throughput APIs with spiky traffic
  • glue code between cloud services
  • webhooks and API integrations
  • one-off ops scripts that used to live on an ops engineer's laptop

They are wrong when you have steady high throughput, long-running work, stateful sessions, or latency SLOs tighter than your cold-start budget.

Every team accidentally tries to build a monolith on Lambda at least once. Knowing the limits and cold-start realities early saves that rewrite. Functions are also where the shared-responsibility line climbs highest: you own almost nothing except the handler code, the IAM role, and the event-source configuration - so when something goes wrong, it is almost always in one of those three places.

Concrete Example

You ship a small function on Lambda that resizes uploaded images.

Config:

  • trigger: S3 object created in uploads/ prefix
  • memory: 512 MB (CPU scales with memory; images benefit from more)
  • timeout: 30 s
  • runtime: Python 3.12
  • IAM role: read from the source bucket, write to the destination bucket, write logs
  • concurrency reserve: 50 (so a burst cannot consume all account concurrency)
# handler.py
import boto3, io, os
from PIL import Image
s3 = boto3.client("s3")
DST = os.environ["DST_BUCKET"]

def handler(event, _ctx):
for r in event["Records"]:
src, key = r["s3"]["bucket"]["name"], r["s3"]["object"]["key"]
buf = io.BytesIO(); s3.download_fileobj(src, key, buf); buf.seek(0)
img = Image.open(buf); img.thumbnail((800, 800))
out = io.BytesIO(); img.save(out, "JPEG", quality=85); out.seek(0)
s3.upload_fileobj(out, DST, f"resized/{key}")

Deploy (SAM-style):

Resources:
ResizeFn:
Type: AWS::Serverless::Function
Properties:
CodeUri: ./; Handler: handler.handler; Runtime: python3.12
MemorySize: 512; Timeout: 30
ReservedConcurrentExecutions: 50
Policies:
- S3ReadPolicy: { BucketName: acme-uploads-prod }
- S3WritePolicy: { BucketName: acme-processed-prod }
Events:
Upload: { Type: S3, Properties: { Bucket: !Ref UploadBucket, Events: s3:ObjectCreated:* } }

What happens on a 1000-image upload burst:

  1. S3 creates 1000 ObjectCreated events
  2. Lambda spins up new execution environments up to the concurrency cap; cold starts appear for the first batch (~300 ms each)
  3. Each environment processes events until idle for several minutes, then is torn down
  4. Your bill is roughly invocations × (avg duration × memory) × per-GB-second rate plus a tiny per-invocation fee

Now a gotcha: the function is in a private VPC to reach an RDS database. The first cold start inside the VPC takes 1.5-2 seconds (ENI attachment). With 1000 cold images, that is 30 minutes of added latency across the burst. The fix is either "don't put Lambda in a VPC unless you must" or "use provisioned concurrency."

Common Confusion / Misconception

"Serverless scales infinitely." It scales to the account concurrency limit you configured (or the provider's default). A flash event can throttle against that limit. Set per-function reserved concurrency so one function cannot starve others. On Lambda, the account-level limit is a soft limit - raisable by support ticket.

"Cold starts are always a problem." For user-facing synchronous APIs with <200 ms SLOs, yes. For async event processing where the SLO is seconds or minutes, cold starts are usually invisible. Classify each workload by its tolerance before optimizing.

"If it is slow, add more memory." Sometimes correct (because CPU scales with memory). Sometimes wrong: if the bottleneck is a network call or a cold start, more memory just means paying more for the same latency. Profile before you tune. AWS's Lambda Power Tuning tool automates the search.

"Retries are free." They are billed. An async failure with an exponential-backoff retry on DynamoDB streams can double your invocation cost if the downstream is flaky. Wire a dead-letter queue (DLQ) and alarm on DLQ depth.

Gotchas:

  • Lambdas invoked from API Gateway see the full HTTP request body as a Base64-encoded blob inside the event payload. A "small" 5 MB file becomes ~7 MB, which crosses the 6 MB sync limit. Uploads larger than a few MB should go to S3 directly via presigned URLs, not through the function.
  • Environment variables are encrypted at rest but visible in the console. Do not put real secrets there; use Secrets Manager or Parameter Store with a KMS key.
  • SnapStart (for Java) snapshots memory. If your init code opens a DB connection, the snapshot captures the connection handle - which is stale on restore. Use Runtime::registerBeforeCheckpoint hooks or lazily reconnect.

How To Use It

For each candidate function:

  1. Describe the trigger and the expected rate (events/sec average and peak).
  2. Measure the expected duration and memory; set them 20-30% above observed.
  3. Check against the hard limits (15 min, 10 GB, 6 MB sync). If any limit is violated, switch to Cloud Run / Fargate.
  4. Attach a function-specific IAM role with least privilege. Never share one big role.
  5. Set reserved concurrency to protect the rest of the account.
  6. Decide whether cold starts matter; if yes, use provisioned concurrency or SnapStart, or move to a container.
  7. Wire a DLQ for async triggers and alarm on its depth.
  8. Ship structured logs (JSON) and correlate with a request ID propagated from the event source.
  9. Pin the deployment package to a specific content hash; Git-SHA tagging the zip or image makes rollbacks trivial.

Check Yourself

  1. Why is 15 minutes the wrong ceiling for a video-transcoding job, and where would you run it instead?
  2. What exactly happens during a cold start, and why is a VPC-attached function's cold start worse?
  3. Why do reserved concurrency and per-function IAM roles reduce blast radius?
  4. You see p50 invocation at 50 ms and p99 at 2.5 s. What is the most likely cause, and which metric confirms it?
  5. A function reads a 40 MB S3 object into memory and times out at the default 3 s. Name two fixes (different from each other).

Mini Drill or Application

Pick one real workload you would run on a function and one you would not. In fifteen minutes, write: trigger, rate, duration, memory, IAM role summary, concurrency policy, and cold-start mitigation (if any) for the first; explain what limit pushes the second off serverless.

Extension: for the first workload, draft the minimal IAM policy (copy from a template, then remove every action and resource that is not strictly needed). Confirm the function still runs. This is how you build the reflex for least-privilege on every future Lambda.

Read This Only If Stuck