GraphQL: Schema-First, Over-fetching, N+1 Concerns

What This Concept Is

GraphQL is a query language for APIs and a runtime for executing those queries against a typed schema. Instead of multiple REST endpoints, a GraphQL API exposes one endpoint (usually POST /graphql) that accepts a query describing exactly the data the client wants.

The core pieces:

Schema (SDL): types, fields, queries, mutations, subscriptions, written once and shared between server and client.
Queries: read operations; clients ask for precisely the fields they need.
Mutations: write operations; one per logical action.
Subscriptions: server-pushed events over a long-lived connection.
Resolvers: server-side functions that, for each field, produce its value.

The design trade for REST's "many endpoints, each shaped by the server" is "one endpoint, each query shaped by the client."

Why It Matters Here

GraphQL shines when:

the client knows what it needs and the server cannot predict the union of client needs (mobile apps, dashboards, third-party clients)
multiple resources need to be composed in one round trip
over-fetching and under-fetching are real costs (bandwidth, latency, battery)

GraphQL hurts when:

caching is important (HTTP caching is harder on a single POST endpoint)
server operators need predictable query cost (clients can write pathological queries)
the domain is simple and REST would already be fine

This concept exists so you can pick GraphQL deliberately, not as a badge.

Concrete Example

Schema:

type Query {
  order(id: ID!): Order
  orders(filter: OrderFilter, first: Int, after: String): OrderConnection!
}

type Mutation {
  createOrder(input: CreateOrderInput!): Order!
  cancelOrder(id: ID!, reason: String): Order!
}

type Order {
  id: ID!
  customer: User!
  items: [LineItem!]!
  total: Money!
  status: OrderStatus!
  createdAt: DateTime!
}

type LineItem {
  sku: String!
  quantity: Int!
  unitPrice: Money!
}

type User {
  id: ID!
  email: String!
  name: String!
  orders(first: Int): [Order!]!
}

enum OrderStatus { PENDING PAID CANCELLED }

Query (client asks for exactly what the screen needs):

query OrderDetail($id: ID!) {
  order(id: $id) {
    id
    status
    total { amountMinor currency }
    items { sku quantity unitPrice { amountMinor currency } }
    customer { name email }
  }
}

Response:

{
  "data": {
    "order": {
      "id": "ord_1",
      "status": "PAID",
      "total": { "amountMinor": 4599, "currency": "USD" },
      "items": [
        { "sku": "s_1", "quantity": 2,
          "unitPrice": { "amountMinor": 2299, "currency": "USD" } }
      ],
      "customer": { "name": "A. Patel", "email": "a@example.com" }
    }
  }
}

Compare to REST: this would be GET /orders/ord_1 followed by GET /users/u_9 - two round trips - or a custom endpoint with embedded customer. GraphQL composes both into one request.

Mutation (domain actions expressed as named mutations):

mutation CancelOrder($id: ID!) {
  cancelOrder(id: $id, reason: "customer_request") {
    id
    status
    cancelledAt
  }
}

Errors (GraphQL has its own error envelope; data is partial):

{
  "data": { "order": null },
  "errors": [
    {
      "message": "Order not found",
      "path": ["order"],
      "extensions": { "code": "NOT_FOUND", "traceId": "0af7..." }
    }
  ]
}

Common Confusion / Misconception

"One endpoint is simpler." Simpler to discover; harder to operate. You lose per-endpoint rate limits, per-endpoint metrics, and HTTP-level caching.

"No over-fetching." Only at the network boundary. If a resolver still loads full database rows to return three fields, you have moved the over-fetching one layer down, not removed it.

The N+1 problem

GraphQL's most famous operational trap. A query like:

{ orders(first: 50) { id customer { name } } }

naively runs 1 query for the 50 orders, then 50 separate queries for their customers. 1 + 50. With nested fields, it compounds: orders { customer { organization { ... } } } can become 1 + 50 + 50 + 50.

Mitigations:

DataLoader: batch-and-cache loader that collects all customer(id) lookups within a request tick and issues one query (WHERE id IN (...)).
Join-aware resolvers: the root resolver fetches orders with customers in one SQL query; child resolvers read from the already-loaded field.
Query cost analysis: reject queries whose static cost exceeds a threshold.

You cannot ship GraphQL in production without one of these; the default resolver pattern is pathological.

Query cost / depth limiting

Clients can write deeply nested queries that explode server cost:

{ orders { customer { orders { customer { orders { ... } } } } } }

Real servers set a maximum depth (maxDepth: 10), maximum complexity (weighted per field), and maximum list size (first caps).

Persisted queries

Operating hint: instead of accepting arbitrary queries from untrusted clients, accept only queries by hash from a pre-registered list (Apollo "persisted queries"). This turns GraphQL's flexibility off for public clients in exchange for safety.

Common Confusion / Misconception (more)

"GraphQL replaces REST." GraphQL and REST coexist in most real systems. A common shape is: GraphQL for composition-heavy client reads; REST/gRPC for writes, bulk operations, and internal service-to-service.

"Schema-first means we never break clients." Schema-first means you can enforce compatibility via schema diffing, but a breaking change is still breaking. Tools like graphql-inspector can catch them in CI.

"Subscriptions are just WebSocket events." They are, but they are also a contract. Document the event payload shape and delivery guarantees exactly like a webhook.

How To Use It

Pick GraphQL when:

Client needs are heterogeneous and compose many resources.
You can afford DataLoader-style infrastructure on the server.
You can enforce cost/depth limits.
You have typed clients that benefit from the schema (React + Apollo, iOS Apollo, etc.).
You do not heavily depend on HTTP caching at the edge.

Skip GraphQL when:

The API is simple and REST would be a short spec.
You need easy rate limiting per operation type.
Your team has no prior GraphQL operational experience and no time to build it.

Check Yourself

Explain the N+1 problem with a concrete query and show how DataLoader solves it.
When would you ship GraphQL to external consumers? What would you require of your infrastructure first?
GraphQL responses typically return 200 OK even for errors. How does that affect client retry behavior compared to REST?

Mini Drill or Application

Take the API you designed in Clusters 2-3 and write:

A GraphQL schema covering the primary entities and their relationships.
Three realistic client queries (a detail view, a list with filters, a composed view pulling two resources).
Three mutations matching your domain actions.
A one-paragraph note on how you would handle N+1 for the queries above.
A one-paragraph note on rate limiting: how would you cost-limit a public version of this API?

Read This Only If Stuck

Geewax: Expressive and simple (applies to GraphQL too)
Geewax: Partial responses and field masks
GraphQL Introduction - canonical external reference
GraphQL Best Practices - short, authoritative
Principled GraphQL (Apollo) - operational guidance

What This Concept Is​

Why It Matters Here​

Concrete Example​

Common Confusion / Misconception​

The N+1 problem​

Query cost / depth limiting​

Persisted queries​

Common Confusion / Misconception (more)​

How To Use It​

Check Yourself​

Mini Drill or Application​

Read This Only If Stuck​