Skip to main content

Statelessness, Sticky Sessions, and Session Stores

What This Concept Is

A server is stateless if it does not retain per-user or per-session memory between requests. Any of its instances can handle any request, because everything needed to process a request arrives with the request or is fetched from shared storage.

A server is stateful if consecutive requests from the same user must go to the same instance because that instance holds private state (in-memory session, local cache of uncommitted work, open file handles, an attached WebSocket).

Between these two live three production patterns:

  • Fully stateless tier: every instance identical; load balancer routes freely.
  • Sticky sessions (session affinity): the load balancer pins a client to one instance for the life of the session, usually by cookie or source IP.
  • External session store: the tier is stateless to the load balancer, but session data is kept in a shared store (Redis, Memcached, a database) that every instance reads and writes.

The Twelve-Factor App's Factor VI ("Processes") is the canonical prescription: "twelve-factor processes are stateless and share-nothing. Any data that needs to persist must be stored in a stateful backing service, typically a database." Twelve-factor is explicit about sticky sessions: "sticky sessions are a violation of twelve-factor and should never be used or relied upon. Session state data is a good candidate for a datastore that offers time-expiration, such as Memcached or Redis."

Most "scale horizontally" plans fail because the tier being scaled was stateful in a way the author did not realize.

Why It Matters Here

Horizontal scaling works when any worker can process any request. The instant you need affinity, three things degrade:

  • Load balancing gets worse: one user's long session may pin them to an already-hot instance. The LB can no longer freely redistribute; a hot user lives on a hot box forever.
  • Deploys get harder: draining stateful instances requires migrating or waiting out sessions. Rolling deploys must now be choreographed with session lifetime; zero-downtime deploys require either session migration or acceptance that some users get logged out.
  • Failure recovery gets harder: when an instance dies, its resident sessions die with it unless state has been replicated somewhere.
  • Auto-scaling gets awkward: scaling in means forcibly evicting sessions or waiting. Scaling out means new instances receive no existing sessions; load is uneven until new users arrive.

The cheapest and most resilient designs keep application tiers stateless and push session state to a shared store. Sticky sessions are a middle road: cheap to implement (one LB setting), but they reintroduce single points of failure and uneven load. Twelve-factor's rule is blunt for a reason.

Concrete Example

A login-aware web app stores current_user_id and a shopping cart in process memory after sign-in.

Design A - in-process session (stateful). After logging in, the user's subsequent requests must hit the same server, or they will appear signed-out and the cart will be empty. Two instances behind a round-robin LB produce constant bugs. You cannot roll a deploy without logging everyone out.

Design B - sticky sessions. The LB hashes the session cookie and always routes that user to the same backend. Works until that backend restarts, hits a GC pause, or is removed for a deploy. Then that user is signed out. Blast radius: 1/N of users per event. Uneven load: a "whale" user (API partner with 10,000 req/s) pins their load to one box; the other N-1 boxes idle.

Design C - external session store (Redis). The session cookie stores only a session ID. Every request: the backend reads session:<id> from Redis, processes, writes back. Any instance serves any request. Deploys are free. An instance dying removes zero sessions. Typical latency overhead: <1ms if Redis is in the same AZ.

Design D - signed client-side session. Session data is encoded directly into the cookie (e.g., JWT) and signed. No server-side store at all; any instance validates and processes. Works for small session payloads; the client pays the bandwidth and the server pays the crypto cost each request. Cannot support revocation without also carrying a revocation list.

Design C is the industry default for browser-heavy workloads. Design D is common for mobile and microservice-to-microservice. Both import a new failure mode: Design C makes Redis critical infrastructure (it must be HA, backed up, and sized for peak read/write load); Design D makes the signing key critical infrastructure (if it leaks, every session is forgeable).

Production scenario (numbers). A social app serves 40,000 req/s across 60 pods. Each request reads session:<id> from Redis (~0.8ms) and writes back on 30% of requests (~0.9ms). Session-store load: 40,000 reads/s + 12,000 writes/s = 52k ops/s total. A single Redis node peaks near 100-150k ops/s, so one HA pair is enough if network latency stays sub-ms. At 2x growth they need to shard. Staleness budget: a session cookie with a 24-hour TTL and a 30s sliding refresh means a user on a new pod sees their session instantly. The right-sized cluster is ElastiCache Redis with 2 shards, cluster-mode, 3 replicas per shard; roughly $600-800/month is negligible against the deploy and recovery savings compared to sticky sessions.

Common Confusion / Misconception

"We are stateless - we do not use sessions." Check again. All of these count as hidden state:

  • an in-memory LRU cache per instance (cache warmup time is state)
  • a local queue of outgoing events not yet flushed to the broker
  • a long-lived database connection with transaction-local state
  • a WebSocket or SSE connection (the client expects to reconnect to "this" instance)
  • a file on local disk that an upload was streamed into
  • an idempotency key cache kept in-process to dedupe retries
  • a rate-limit token bucket stored per instance (the effective limit is N * configured_limit)

"We use sticky sessions so it is fine." Sticky sessions trade one failure mode for another: the load balancer is now a stateful component (it remembers the affinity map), deploys require drain time, and an unlucky user who hits a dying instance gets a bad experience while an even spray would have been imperceptible.

"Session in cookies is stateless." The server side is; the client side is not. A cookie-carried session is still state, just state you do not own. Cookie-carried session data has practical limits (browsers enforce 4KB; HTTP/2 HPACK-compresses repeats but still pays the wire cost on cache miss). Anything larger than a few KB must return to server-side storage.

"We can just replicate in-memory sessions across instances." This trades one hard problem (statelessness) for another (distributed consensus). You now own a session replication bus that must survive partitions, instance churn, and deploys. Twelve-factor's advice exists because this path reliably ends in outage retrospectives.

How To Use It

For every tier in your design:

  1. Enumerate every piece of per-request or per-user data the tier uses.
  2. For each, ask: "If the next request for this user lands on a different instance, does anything break?" If yes, that is state - move it to a store or replicate it.
  3. Default to external session store unless latency or cost forbids it. Redis and Memcached are the standard choices; a small row in Postgres works too. Expect <1ms additional latency for co-located Redis, 1-5ms cross-AZ.
  4. Use sticky sessions only when the state is fundamentally local and expensive to externalize (e.g., long-lived WebSockets, streaming audio/video sessions, large per-connection compiled state). Even then, plan for the session to be dropped gracefully.
  5. Make the session store itself highly available. A single-node Redis is not a session-store; it is a pending outage. Use Redis Sentinel, Redis Cluster, or an HA managed service (ElastiCache, MemoryStore, Upstash).
  6. Size the session store for peak read/write load, not average. A session store hit on every request at 10,000 req/s is 10,000 read/s + up-to 10,000 write/s. Memcached peaks around 200k ops/s per node; Redis around 100-150k ops/s per node single-threaded.
  7. Set TTLs on session keys. Sessions are not forever; stale keys waste memory and inflate cost. A 24h sliding TTL is standard.

Check Yourself

  1. What specific behavior convinces you a tier you inherited is not actually stateless?
  2. Why is sticky sessions a worse answer than an external session store for most modern web apps?
  3. What is the failure mode of an external session store that horizontal scaling does not fix?
  4. Twelve-factor forbids sticky sessions. Give one legitimate production scenario where you would still use them and justify why.
  5. A signed cookie (JWT) eliminates the server-side session store. Name two problems this creates.
  6. A rate limiter stored as a per-instance token bucket has a hidden multiplier on effective limit. Quantify it for N = 20 instances.
  7. Why does a WebSocket tier usually need affinity, and what is the minimum-viable design that still allows rolling deploys?

Mini Drill or Application

Take a service you have seen. For each place you suspect state lives, write one line: "Instance-local: Y/N", "Impact if user lands on different instance: _", "Proposed move: _". If every line reads "Instance-local: N", the tier is stateless; otherwise you have found your next refactor. Then estimate the session-store load: current peak req/s * cost of one session-store round-trip. If it exceeds the capacity of a single node, you have a sharding problem to solve, not a cache problem.

Transfer / Where This Shows Up Later

Statelessness is the feature that lets the rest of this module actually work. Until a tier is stateless, you cannot cleanly apply anything that follows.

  • This module, concept 06 (caching): the external session store is a cache in every respect - hit rate, TTL, thundering-herd risk, HA requirements. The caching chapter's rules apply to it.
  • This module, concept 09 (chaos): the canonical chaos experiment for a stateless tier is "kill one pod"; if a user notices, the tier was not stateless.
  • This module, concepts 10-11 (queueing, load shedding): rate limiters and admission controllers that store counters in-process are hidden state; either make them consistent across instances or accept that your effective limit is N * configured.
  • S8 M2 (microservices): "team-owned, independently deployable" is only cheap when instances are fungible. Sticky sessions punish the independently-deployable criterion.
  • S8 M3 (event-driven): stateless consumers with state in a durable log is the pattern that makes event-driven systems scale; it is this concept generalized to "state lives in the log, not the consumer."
  • S9 M3 (Kubernetes): Deployments vs StatefulSets is a Kubernetes-level restatement of this concept. StatefulSets exist because some tiers are genuinely stateful; most should not be.
  • S9 M5 (observability): a sticky-session tier has per-instance metrics that are not comparable across the fleet; aggregating them honestly is a real instrumentation problem.
  • S10 M3-M4 (capstone): "prove your app tier is stateless" is a readiness-review checklist item. The proof is a chaos drill, not a design document.

The leadership transfer: when a team claims "we are stateless," the useful question is not "really?" but "what is the last 1% of state, and what would it take to move it?" That question produces a roadmap; the binary question produces defensiveness.

One more practical rule: the session store is your users' logged-in experience in a box. Treat it with the ceremony of a database. Monitor it with SLOs. Fail over between AZs. Have a restore-from-backup drill. When it fails, your users are signed out - which is an outage by any reasonable definition.

Read This Only If Stuck

Local chunks (book anchors)

External canonical references