Place the Caches, CDN, and Load Balancers

What This Concept Is

Caches, CDNs, and load balancers (LBs) are the performance-critical infrastructure you add on top of the functional design. Each of them earns its place by answering two questions:

What does it make faster or safer? (latency, throughput, availability)
What new failure mode and what new consistency problem does it introduce?

If a candidate cannot answer both, the component should not be on the diagram.

Typical placements:

CDN at the edge for static assets, cached responses, and TLS termination close to the user.
Load balancer in front of every stateless service tier; at the edge (L4/L7 global LB) and internally (L7 service-mesh LB).
Cache on the read path for any store that has hot keys and a forgiving consistency contract. Categories: client cache, CDN cache, edge cache, application cache (Redis/Memcached), DB buffer pool.

A useful frame: every cache is a second copy of the truth. The design decisions are where that second copy lives, how long it is allowed to be wrong, and what happens when it disappears. Cache placement becomes disciplined when you treat these three as explicit contracts rather than implicit defaults.

Why It Matters Here

Wrong placement is either useless or dangerous.

A cache in front of a store with strict read-your-write semantics becomes a correctness incident waiting to happen.
A CDN in front of dynamic per-user content leaks one user's data to another.
A load balancer on a stateful service without sticky sessions breaks user flows.

Correct placement is most of the difference between a design that scales and one that does not, at the same cost.

Composite read path for a social feed, showing each layer:

 Client ─▶ Client-side cache (recent items, 30s)
        │
        ▼
  CDN (PoP) ─▶ cached /static/*, profile avatars; TTL minutes to hours
        │ miss / non-static
        ▼
  Global LB (L4, latency-based DNS or anycast)
        │
        ▼
  Regional LB (L7, HTTPS, request routing)
        │
        ▼
  Stateless App Tier (Feed Service)
        │ cache lookup
        ▼
  Distributed cache (Redis cluster) ──▶ keyed by user_id:timeline:page=0
        │ miss
        ▼
  Timeline Store (wide-column) ──▶ partition by user_id

And the write path:

  Client ─▶ LB ─▶ App ─▶ Primary store (write-through)
                            │
                            ▼
                      Invalidate / populate cache
                            │
                            ▼
                       Kafka (async)
                            ├─▶ fan-out workers
                            └─▶ search indexer, analytics

Annotations that earn each piece of infrastructure:

Client cache eliminates request volume for repeated views within 30 s; trade-off is staleness.
CDN collapses global latency for static content; trade-off is cache invalidation complexity.
Global LB gives latency-based routing across regions; trade-off is that failover is DNS-scale (seconds to minutes).
Regional LB (L7) terminates TLS, does request routing, rate limiting, and health-based ejection; trade-off is one more hop.
Distributed cache absorbs the 80/20 hot set; trade-off is a cache-invalidation protocol and a warm-up story.
Async Kafka isolates slow downstream consumers from the user-visible write path; trade-off is eventual consistency of the derived views.

Cache update pattern choices (covered in System Design Primer: Cache update patterns):

Cache-aside (lazy): app reads cache, on miss reads DB and populates cache. Simple; risk of dogpiles.
Write-through: app writes cache, cache writes DB synchronously. Reads are fast, writes are slow.
Write-back: app writes cache, DB is updated asynchronously. Fastest writes, risk of loss on cache failure.
Refresh-ahead: cache proactively refreshes before TTL expires. Works only if access pattern is predictable.

For a feed, cache-aside on reads and explicit invalidation on writes is the default; write-through is reserved for data where cache and DB must agree synchronously.

Concrete Example 2: News Site vs Logged-In Dashboard

Same infrastructure toolbox, different placements:

News site (largely static articles, 10 M DAU): CDN carries 95% of read traffic. Origin sees a tiny residual. Regional LB is minimal. Application-level cache is unnecessary -- the CDN is the cache. Invalidation is explicit on article edit (purge-by-URL). The failure mode: if the CDN purge fails silently, readers see stale articles; this is an annotated, accepted cost that the business tolerates.
Logged-in dashboard (per-user data, 1 M DAU): CDN is disabled on the authenticated path (Cache-Control: private, no-store), because serving another user's dashboard from an intermediate cache is a data-leak incident. Instead the architecture leans on client-side caching (short TTL, user-scoped) and an application-tier Redis cluster keyed by user_id:view:timestamp. Load balancing is L7 (HTTPS, per-user routing, rate limits). The CDN still terminates TLS and serves static assets, but not responses.

The two systems share 70% of infrastructure types and 0% of placement decisions. Candidates who copy a stock diagram fail this distinction; candidates who annotate why each piece is where it is pass it.

Common Confusion / Misconceptions

"Add a cache everywhere performance matters." Caches add complexity: invalidation, warming, memory sizing, failure semantics. Add a cache when you can write down both a measured benefit and a named failure mode.

"The LB is a commodity." At edge scale, the LB is one of your most important components. It decides whose traffic goes where, when to eject an unhealthy host, how to handle slowloris, and what the blast radius of a bad deploy is.

"CDN equals cache." A CDN is a globally distributed cache plus a proxy plus a TLS terminator. It can cache dynamic responses if you teach it to vary on the right headers, and it can cost you a data leak if you do not.

"Sticky sessions are fine." Sticky sessions couple your scale to your routing. Prefer stateless services with externalized session state (Redis, JWT). Use stickiness only where a specific protocol (WebSockets, streaming) requires it.

"Caches improve availability." Caches reduce load. Availability comes from being able to survive the cache being cold. Size the origin to handle post-cache-outage traffic, not post-cache-hit traffic.

How To Use It

Per-component justification template. For every LB, CDN, or cache you draw:

What it serves: traffic type, QPS absorbed, payload sizes.
Why it is there: specific latency, throughput, or availability target it makes possible.
Consistency story: if it serves stale data, for how long and why that is acceptable.
Invalidation / eviction: TTL, LRU, write-triggered invalidation, explicit purge.
Failure mode: what happens if the cache is cold, the CDN is down, the LB ejects a healthy host.

Write these five lines on the side of the diagram. A reviewer will point at one and ask; be ready.

Transfer / Where This Shows Up Later

Cluster 4 concept 11 (failure walk) audits every cache/CDN/LB for blast radius and TTR.
Cluster 4 concept 12 (SPOFs) most often surfaces the cache cluster as a soft SPOF -- this concept sets you up to see it coming.
S8M3 (data patterns) revisits cache consistency (write-through, write-behind, refresh-ahead) in depth.
S8M4 (scale/reliability/performance) operates the load-balancing and CDN decisions under real load; autoscaling policies and connection-draining live here.
S9 (cloud) maps these to AWS/GCP/Azure services (CloudFront, ALB/NLB, ElastiCache; Cloud CDN, Cloud Load Balancing, Memorystore).
S10 capstone/interviews: "where do you place the cache?" is the single most common Cluster 2 follow-up.

Check Yourself

Why would you NOT put a cache in front of a billing ledger? What if the read ratio is 1000:1?
What is the difference between an L4 and an L7 load balancer, and when does that difference matter?
In a cache-aside pattern, what is a "dogpile" and how do you prevent it?
If the CDN goes down globally for ten minutes, can your origin survive it? If not, fix the design.

Mini Drill or Application

For each workload, sketch the cache/CDN/LB layers and annotate each with the five-line justification:

A global, mostly-static documentation site with 10 M DAU.
A logged-in dashboard with per-user data, 1 M DAU.
A real-time chat system with per-conversation fan-out.

For each, name one component you chose NOT to add, and why.

Read This Only If Stuck

System Design Primer: Content delivery network -- push vs pull CDN, invalidation.
System Design Primer: Load balancer -- L4/L7, health checks, anycast.
System Design Primer: Reverse proxy -- distinguishes proxy from LB.
System Design Primer: Cache overview and levels -- taxonomy of cache layers.
System Design Primer: Cache update patterns -- cache-aside, write-through, write-back, refresh-ahead.
System Design Primer: Asynchronism -- why Kafka sits between the write path and derived views.
Fundamentals: Replicated vs distributed caching -- cache-topology choice in space-based architectures.
Google Cloud -- Load balancing overview -- the canonical GCP taxonomy of global/regional, L4/L7 load balancers.
AWS Well-Architected Framework -- Reliability Pillar -- includes the standard availability-through-redundancy framing for LBs and CDNs.

What This Concept Is​

Why It Matters Here​

Concrete Example: Social Feed Read/Write Paths​

Concrete Example 2: News Site vs Logged-In Dashboard​

Common Confusion / Misconceptions​

How To Use It​

Transfer / Where This Shows Up Later​

Check Yourself​

Mini Drill or Application​

Read This Only If Stuck​