Statelessness, Sticky Sessions, and Session Stores

What This Concept Is

A server is stateless if it does not retain per-user or per-session memory between requests. Any of its instances can handle any request, because everything needed to process a request arrives with the request or is fetched from shared storage.

A server is stateful if consecutive requests from the same user must go to the same instance because that instance holds private state (in-memory session, local cache of uncommitted work, open file handles, an attached WebSocket).

Between these two live three production patterns:

Fully stateless tier: every instance identical; load balancer routes freely.
Sticky sessions (session affinity): the load balancer pins a client to one instance for the life of the session, usually by cookie or source IP.
External session store: the tier is stateless to the load balancer, but session data is kept in a shared store (Redis, Memcached, a database) that every instance reads and writes.

The Twelve-Factor App's Factor VI ("Processes") is the canonical prescription: "twelve-factor processes are stateless and share-nothing. Any data that needs to persist must be stored in a stateful backing service, typically a database." Twelve-factor is explicit about sticky sessions: "sticky sessions are a violation of twelve-factor and should never be used or relied upon. Session state data is a good candidate for a datastore that offers time-expiration, such as Memcached or Redis."

Most "scale horizontally" plans fail because the tier being scaled was stateful in a way the author did not realize.

Why It Matters Here

Horizontal scaling works when any worker can process any request. The instant you need affinity, three things degrade:

Load balancing gets worse: one user's long session may pin them to an already-hot instance. The LB can no longer freely redistribute; a hot user lives on a hot box forever.
Deploys get harder: draining stateful instances requires migrating or waiting out sessions. Rolling deploys must now be choreographed with session lifetime; zero-downtime deploys require either session migration or acceptance that some users get logged out.
Failure recovery gets harder: when an instance dies, its resident sessions die with it unless state has been replicated somewhere.
Auto-scaling gets awkward: scaling in means forcibly evicting sessions or waiting. Scaling out means new instances receive no existing sessions; load is uneven until new users arrive.

The cheapest and most resilient designs keep application tiers stateless and push session state to a shared store. Sticky sessions are a middle road: cheap to implement (one LB setting), but they reintroduce single points of failure and uneven load. Twelve-factor's rule is blunt for a reason.

Concrete Example

A login-aware web app stores current_user_id and a shopping cart in process memory after sign-in.

Design A - in-process session (stateful). After logging in, the user's subsequent requests must hit the same server, or they will appear signed-out and the cart will be empty. Two instances behind a round-robin LB produce constant bugs. You cannot roll a deploy without logging everyone out.

Design B - sticky sessions. The LB hashes the session cookie and always routes that user to the same backend. Works until that backend restarts, hits a GC pause, or is removed for a deploy. Then that user is signed out. Blast radius: 1/N of users per event. Uneven load: a "whale" user (API partner with 10,000 req/s) pins their load to one box; the other N-1 boxes idle.

Design C - external session store (Redis). The session cookie stores only a session ID. Every request: the backend reads session:<id> from Redis, processes, writes back. Any instance serves any request. Deploys are free. An instance dying removes zero sessions. Typical latency overhead: <1ms if Redis is in the same AZ.

Design D - signed client-side session. Session data is encoded directly into the cookie (e.g., JWT) and signed. No server-side store at all; any instance validates and processes. Works for small session payloads; the client pays the bandwidth and the server pays the crypto cost each request. Cannot support revocation without also carrying a revocation list.

Design C is the industry default for browser-heavy workloads. Design D is common for mobile and microservice-to-microservice. Both import a new failure mode: Design C makes Redis critical infrastructure (it must be HA, backed up, and sized for peak read/write load); Design D makes the signing key critical infrastructure (if it leaks, every session is forgeable).

Production scenario (numbers). A social app serves 40,000 req/s across 60 pods. Each request reads session:<id> from Redis (~0.8ms) and writes back on 30% of requests (~0.9ms). Session-store load: 40,000 reads/s + 12,000 writes/s = 52k ops/s total. A single Redis node peaks near 100-150k ops/s, so one HA pair is enough if network latency stays sub-ms. At 2x growth they need to shard. Staleness budget: a session cookie with a 24-hour TTL and a 30s sliding refresh means a user on a new pod sees their session instantly. The right-sized cluster is ElastiCache Redis with 2 shards, cluster-mode, 3 replicas per shard; roughly $600-800/month is negligible against the deploy and recovery savings compared to sticky sessions.

Common Confusion / Misconception

"We are stateless - we do not use sessions." Check again. All of these count as hidden state:

an in-memory LRU cache per instance (cache warmup time is state)
a local queue of outgoing events not yet flushed to the broker
a long-lived database connection with transaction-local state
a WebSocket or SSE connection (the client expects to reconnect to "this" instance)
a file on local disk that an upload was streamed into
an idempotency key cache kept in-process to dedupe retries
a rate-limit token bucket stored per instance (the effective limit is N * configured_limit)

"We use sticky sessions so it is fine." Sticky sessions trade one failure mode for another: the load balancer is now a stateful component (it remembers the affinity map), deploys require drain time, and an unlucky user who hits a dying instance gets a bad experience while an even spray would have been imperceptible.

"Session in cookies is stateless." The server side is; the client side is not. A cookie-carried session is still state, just state you do not own. Cookie-carried session data has practical limits (browsers enforce 4KB; HTTP/2 HPACK-compresses repeats but still pays the wire cost on cache miss). Anything larger than a few KB must return to server-side storage.

"We can just replicate in-memory sessions across instances." This trades one hard problem (statelessness) for another (distributed consensus). You now own a session replication bus that must survive partitions, instance churn, and deploys. Twelve-factor's advice exists because this path reliably ends in outage retrospectives.

How To Use It

For every tier in your design:

Enumerate every piece of per-request or per-user data the tier uses.
For each, ask: "If the next request for this user lands on a different instance, does anything break?" If yes, that is state - move it to a store or replicate it.
Default to external session store unless latency or cost forbids it. Redis and Memcached are the standard choices; a small row in Postgres works too. Expect <1ms additional latency for co-located Redis, 1-5ms cross-AZ.
Use sticky sessions only when the state is fundamentally local and expensive to externalize (e.g., long-lived WebSockets, streaming audio/video sessions, large per-connection compiled state). Even then, plan for the session to be dropped gracefully.
Make the session store itself highly available. A single-node Redis is not a session-store; it is a pending outage. Use Redis Sentinel, Redis Cluster, or an HA managed service (ElastiCache, MemoryStore, Upstash).
Size the session store for peak read/write load, not average. A session store hit on every request at 10,000 req/s is 10,000 read/s + up-to 10,000 write/s. Memcached peaks around 200k ops/s per node; Redis around 100-150k ops/s per node single-threaded.
Set TTLs on session keys. Sessions are not forever; stale keys waste memory and inflate cost. A 24h sliding TTL is standard.

Check Yourself

What specific behavior convinces you a tier you inherited is not actually stateless?
Why is sticky sessions a worse answer than an external session store for most modern web apps?
What is the failure mode of an external session store that horizontal scaling does not fix?
Twelve-factor forbids sticky sessions. Give one legitimate production scenario where you would still use them and justify why.
A signed cookie (JWT) eliminates the server-side session store. Name two problems this creates.
A rate limiter stored as a per-instance token bucket has a hidden multiplier on effective limit. Quantify it for N = 20 instances.
Why does a WebSocket tier usually need affinity, and what is the minimum-viable design that still allows rolling deploys?

Mini Drill or Application

Take a service you have seen. For each place you suspect state lives, write one line: "Instance-local: Y/N", "Impact if user lands on different instance: _", "Proposed move: _". If every line reads "Instance-local: N", the tier is stateless; otherwise you have found your next refactor. Then estimate the session-store load: current peak req/s * cost of one session-store round-trip. If it exceeds the capacity of a single node, you have a sharding problem to solve, not a cache problem.

Transfer / Where This Shows Up Later

Statelessness is the feature that lets the rest of this module actually work. Until a tier is stateless, you cannot cleanly apply anything that follows.

This module, concept 06 (caching): the external session store is a cache in every respect - hit rate, TTL, thundering-herd risk, HA requirements. The caching chapter's rules apply to it.
This module, concept 09 (chaos): the canonical chaos experiment for a stateless tier is "kill one pod"; if a user notices, the tier was not stateless.
This module, concepts 10-11 (queueing, load shedding): rate limiters and admission controllers that store counters in-process are hidden state; either make them consistent across instances or accept that your effective limit is N * configured.
S8 M2 (microservices): "team-owned, independently deployable" is only cheap when instances are fungible. Sticky sessions punish the independently-deployable criterion.
S8 M3 (event-driven): stateless consumers with state in a durable log is the pattern that makes event-driven systems scale; it is this concept generalized to "state lives in the log, not the consumer."
S9 M3 (Kubernetes): Deployments vs StatefulSets is a Kubernetes-level restatement of this concept. StatefulSets exist because some tiers are genuinely stateful; most should not be.
S9 M5 (observability): a sticky-session tier has per-instance metrics that are not comparable across the fleet; aggregating them honestly is a real instrumentation problem.
S10 M3-M4 (capstone): "prove your app tier is stateless" is a readiness-review checklist item. The proof is a chaos drill, not a design document.

The leadership transfer: when a team claims "we are stateless," the useful question is not "really?" but "what is the last 1% of state, and what would it take to move it?" That question produces a roadmap; the binary question produces defensiveness.

One more practical rule: the session store is your users' logged-in experience in a box. Treat it with the ceremony of a database. Monitor it with SLOs. Fail over between AZs. Have a restore-from-backup drill. When it fails, your users are signed out - which is an outage by any reasonable definition.

Read This Only If Stuck

Local chunks (book anchors)

System Design Primer: Load Balancer -- session affinity is a load-balancer feature that the concept says to avoid; know it so you can refuse it.
System Design Primer: Reverse Proxy -- the layer that terminates TLS and issues the cookie that lives at the center of session design.
System Design Primer: Availability Patterns -- the N+1 and active-active patterns assume stateless instances; statefulness breaks the math.
System Design Primer: Application Layer and Microservices -- the architectural layer this concept is mostly about.
System Design Primer: Database federation and sharding -- when the session store itself must scale horizontally, this is the playbook.
FoSA: Asynchronous Capabilities -- moving state out of the request path into a durable async stream is the pattern that makes the stateless tier truly cheap.
FoSA: Cross-Cutting Architecture Characteristics -- reliability, scalability, and deployability all cross-cut into this concept.

External canonical references

The Twelve-Factor App, Processes (stateless) and Backing Services -- the canonical prescriptions. Factor VI is the single most-cited sentence on this topic.
Google SRE Workbook, Managing Load -- the chapter on load balancing, draining, and safe restarts for stateless fleets.
AWS Builders' Library, Amazon's approach to high-availability deployment -- rolling deploys in detail; they only work because tiers are stateless.
Marc Brooker, Statelessness for the sake of statelessness -- a principal engineer's case against over-applying the rule (there are real stateful services; don't pretend otherwise).
Redis Labs, Redis Cluster Specification -- the sharding story for the session store; read before assuming "just use Redis."
Netflix Tech Blog, Scaling memcached at Facebook (external link to the NSDI paper) -- the industry reference for running a shared session/cache tier at very large scale.

What This Concept Is​

Why It Matters Here​

Concrete Example​

Common Confusion / Misconception​

How To Use It​

Check Yourself​

Mini Drill or Application​

Transfer / Where This Shows Up Later​

Read This Only If Stuck​

Local chunks (book anchors)​

External canonical references​