Skip to main content

Service Discovery, API Gateways, and the BFF Pattern

What This Concept Is

Three related infrastructure patterns for wiring services together.

  • Service discovery. A service needs to reach another service without hard-coding its IP. A registry (Consul, etcd, Kubernetes' built-in DNS, a service mesh) keeps the mapping between service name and current instances. Clients look up by name; the registry handles scaling and failures.
  • API gateway. A single entry point at the edge of the architecture. It handles cross-cutting concerns for external clients: authentication, rate limiting, TLS termination, request routing, response aggregation, WAF, quotas. Behind it, internal services speak whatever protocol they prefer.
  • Backend-for-Frontend (BFF). A specialized gateway per client type (mobile, web, partner API). Each BFF knows exactly what its client needs and composes downstream services accordingly. Unlike a generic gateway, a BFF has business logic tied to one client's needs.

The three compose: the client hits a BFF, which hits the gateway (or goes direct), which routes to services discovered via the registry.

Why It Matters Here

Clients should not have to know about internal service topology. Internal services should not have to re-implement auth, rate limiting, TLS. And mobile clients, with their latency and battery constraints, should not have to do 8 round-trips to render a single screen -- that is what a BFF fixes.

Service Discovery: Client-Side vs Server-Side

  • Client-side discovery. The caller queries the registry, gets a list of healthy instances, picks one (round-robin, random, least-conn). The caller embeds a load-balancing library.
  • Server-side discovery. The caller hits a load balancer (or service mesh sidecar) that knows the registry. The caller is oblivious.

Server-side is more common in modern setups because it removes language-specific LB libraries; a Kubernetes Service or a service-mesh (Istio, Linkerd) handles it.

Either way, do not hard-code IPs, and do not rely on DNS TTLs shorter than the time it takes to notice an instance is bad.

Example: API Gateway at the Edge

What the gateway handles, so services do not each reimplement it:

  • TLS termination (gateway holds the cert, internal traffic can be plaintext inside a trusted network or mTLS via a mesh)
  • Authentication: validate JWT, translate to internal user ID header
  • Authorization: coarse-grained rules ("is this path public?")
  • Rate limiting: per-IP, per-user, per-endpoint
  • Request logging and tracing headers injection (see concept 13)
  • Response aggregation (sometimes -- though BFFs do this better)
  • CORS

Common implementations: Kong, Envoy (often as a mesh component), AWS API Gateway, NGINX, Zuul. Do not build one from scratch unless you really have to.

Example: BFF for a Mobile App

Mobile home screen needs: user profile, recent orders (last 3), recommendations, loyalty points, new-arrivals banner.

Without a BFF, the app makes 5 parallel or serial calls. Each round trip costs 100-300ms on cellular. Each failure partially breaks the screen.

With a mobile BFF:

GET /mobile/home
-> BFF fans out in parallel to Accounts, Orders, Recommendations, Loyalty, Catalog
-> Shapes the response for exactly what the mobile screen needs
-> Returns one JSON payload
-> One round trip, aggregated error handling, mobile-friendly shape

The mobile BFF is owned by the mobile team. The web BFF (if needed) is owned by the web team. Each BFF knows its client intimately. The downstream services stay generic and single-purpose.

Common Confusion / Misconception

"The API gateway is the BFF." They overlap in infrastructure but differ in ownership and purpose. Gateways handle cross-cutting concerns and are shared; BFFs contain client-specific shaping and are owned by one client team. A gateway owned by a "platform team" with logic for every client mixed in is the worst of both -- it becomes a central bottleneck that every product team has to wait on.

"Use a BFF for everything." BFFs are good for consumer-facing clients with very different needs. For system-to-system APIs with no client intimacy, a plain gateway or direct calls are fine.

"Service mesh replaces the gateway." A mesh handles internal east-west traffic (service-to-service) with sidecars; the gateway handles north-south traffic (client-to-cluster). Most mature stacks have both.

How To Use It

  1. Put exactly one gateway at the edge. Make a platform team responsible for it.
  2. Use the service registry your platform already gives you (Kubernetes DNS + Service objects is enough for most cases).
  3. Create a BFF per client type only if the client has distinct aggregation or shaping needs. Do not create BFFs per service.
  4. Keep business logic in services, cross-cutting concerns in the gateway, client-shaping in the BFFs. Do not cross these.
  5. For internal auth across services, prefer mTLS via the mesh or a token-forwarding pattern; do not reimplement auth per service.

Check Yourself

  1. What is the practical difference between an API gateway and a BFF?
  2. Why is hard-coded IP addressing between services almost always wrong?
  3. When does a BFF actually earn its cost?

Mini Drill or Application

Take the e-commerce decomposition. In 10 minutes:

  • Name the gateway and what it handles.
  • Decide whether a mobile BFF, web BFF, or both are warranted.
  • Describe the home-screen call: one aggregated endpoint on the BFF, and the downstream fan-out.

How This Sits In The Module

Concept 10 picked the communication style per interaction. This concept placed the infrastructure that carries them. Concept 12 makes that infrastructure survive partial failure.

Gateway vs Service Mesh: North-South vs East-West

A common confusion is whether a service mesh (Istio, Linkerd, Consul Connect) replaces a gateway. The short answer: no, they solve different traffic directions.

AxisAPI GatewayService Mesh
Traffic directionNorth-south (client -> cluster)East-west (service -> service)
Primary concernsAuthN of external users, rate limit, WAF, TLS termination, public APImTLS, retries/timeouts/circuit breakers, policy, telemetry
Typical deploymentDedicated edge tierSidecar per service
OwnerPlatform + securityPlatform
Example productsKong, Apigee, AWS API Gateway, Envoy as edgeIstio, Linkerd, Consul, AWS App Mesh

Mature stacks have both. The gateway serves public APIs; the mesh moves the resilience primitives (concept 12) out of application code into the sidecar. The gateway may itself be implemented as Envoy (which is also the data plane for Istio), which is why the components look similar.

BFF Trade-offs: When Not To Build One

BFFs are not free. Each BFF is itself a service: it has on-call, deploys, contracts with downstreams, and a team (the client team). Before building one, check:

  • Client diversity. If mobile and web have nearly identical needs, one BFF or none is fine. Build a BFF when the clients differ materially in fan-out, latency constraints, or shape requirements.
  • Downstream stability. BFFs amplify downstream churn: every downstream breaking change risks breaking the BFF. If downstreams are mid-migration, the BFF absorbs the complexity -- which may or may not be what you want.
  • Ownership clarity. A BFF owned by a central team that serves every client is a god-gateway in disguise. The BFF must be owned by the client team.

BFFs shine for consumer-facing clients (mobile apps, web) where round trips are expensive and shape matters. They are usually unnecessary for internal service-to-service APIs.

Read This Only If Stuck

Local chunks

External canonical references

Depth Path

  • Envoy and Istio documentation, or your cloud provider's API gateway docs, once you have chosen a platform. The concept here is stable; the mechanics are implementation-specific.
  • Matt Klein (Envoy creator), Envoy mobile, service mesh, and the future of the proxy -- perspective on where the two tiers are heading.