Module 4: API Design & Contract Evolution: Case Studies
These case studies treat APIs as architecture. Every endpoint, event, schema field, error shape, pagination token, and deprecation notice is a contract somebody else's code may depend on.
How To Use These Case Studies
- Identify the consumer and the contract they rely on.
- Separate implementation change from contract change.
- Name the compatibility rule.
- Produce the required artifact.
- Decide how the contract will be tested and deprecated.
Case Study 1: Idempotent Payment Creation
Scenario: A mobile app calls POST /payments and times out. The user taps again. Without an idempotency contract, the server may create two payments.
Source anchor: Stripe's API docs describe idempotency keys as unique client-generated keys used to recognize retries and return the same result for repeated requests. Stripe's idempotency docs for Stripe API idempotent requests official docs idempotency keys.
Module concepts:
- idempotency
- POST side effects
- retry safety
- request identity
- timeout ambiguity
Wrong Approach
"POST is not idempotent, so clients should not retry."
Networks fail. Clients will retry. The API should make safe retry possible for operations where duplicate side effects are unacceptable.
Better Approach
Define an idempotency contract:
POST /payments
Idempotency-Key: 8f6d...
Server rule:
same key + same request body -> return original result
same key + different request body -> reject
key retention -> documented window
Tradeoff Table
| Choice | Gain | Cost |
|---|---|---|
| no retry | avoids duplicates | poor reliability |
| client retry without key | hides transient faults | duplicate side effects |
| idempotency key | safe retry | key store and body-hash logic |
| natural resource ID via PUT | idempotent by URI | not always a good domain fit |
Failure Mode
Timeout ambiguity turns into duplicate payments, orders, emails, or reservations.
Required Artifact
Write an idempotency contract: key scope, retention, body comparison, response replay, and conflict response.
Project / Capstone Connection
Every capstone API with side effects should include retry-safe idempotency rules.
Case Study 2: Cursor Pagination Instead Of Page Numbers
Scenario: GET /orders?page=3&size=50 works in staging. In production, new orders arrive while a client paginates, causing duplicate or skipped rows.
Source anchor: Microsoft REST API guidance discusses paging large collections and API design consistency. Microsoft Learn REST API best practices for Microsoft REST API Guidelines official pagination versioning idempotency.
Module concepts:
- pagination
- stable ordering
- cursor
- consistency window
- collection contract
Wrong Approach
"Page number plus limit is enough."
Offset pagination is easy but unstable when the collection changes during traversal.
Better Approach
Use cursor pagination with a stable sort:
GET /orders?limit=50&after=eyJjcmVhdGVkX2F0Ijoi...
Contract:
order by created_at desc, id desc
cursor encodes last tuple
new writes may appear before first page
client follows next_cursor until null
Tradeoff Table
| Choice | Gain | Cost |
|---|---|---|
| offset/page | simple UX | skips/duplicates under writes |
| cursor | stable traversal | opaque token and sort discipline |
| snapshot token | strongest consistency | server-side state or timestamp semantics |
| search-after | good for large datasets | cannot jump to arbitrary page |
Failure Mode
Clients build reconciliation logic because the API cannot provide stable traversal.
Required Artifact
Write a pagination contract with sort keys, cursor format, mutation behavior, and invalid-cursor response.
Project / Capstone Connection
Capstone list APIs should avoid offset pagination for high-change collections.
Case Study 3: GraphQL Solves Overfetching And Reintroduces N+1
Scenario: A GraphQL endpoint lets clients request exactly the fields they need. A query asks for 50 issues, each issue's author, labels, and latest comment. Resolvers independently query the database, producing N+1 behavior.
Source anchor: GraphQL DataLoader describes batching and caching per request and notes that GraphQL field resolvers can easily create inefficient loading without a batching mechanism. the DataLoader repository for GraphQL N+1 DataLoader official docs batching caching.
Module concepts:
- GraphQL
- resolver boundary
- N+1
- batching
- contract vs implementation
Wrong Approach
"GraphQL is more efficient because clients select fields."
Field selection reduces overfetching, not necessarily backend work.
Better Approach
Design resolver loading:
Request:
issues -> author -> labels -> latestComment
Loader plan:
batch authors by user_id
batch labels by issue_id
batch latest comments by issue_id
cache within request
Tradeoff Table
| Choice | Gain | Cost |
|---|---|---|
| REST fixed shape | predictable backend query | possible overfetching |
| GraphQL naive resolvers | client flexibility | N+1 and cost unpredictability |
| DataLoader batching | efficient field resolution | per-request cache design |
| persisted queries/cost limits | operational control | governance overhead |
Failure Mode
The API contract looks elegant while the database receives hundreds of hidden queries.
Required Artifact
Write a resolver query plan and a query-count test for one nested GraphQL request.
Project / Capstone Connection
If a capstone uses GraphQL, include resolver batching and cost controls.
Case Study 4: Breaking Change Hidden As Cleanup
Scenario: An API response includes customer_name. The server team renames it to display_name because the domain language improved. Mobile clients crash after deployment.
Source anchor: Google API Improvement Proposals document resource-oriented API design and compatibility practices for evolving APIs. See Google AIP-180 backward compatibility.
Module concepts:
- backward compatibility
- additive change
- field removal
- deprecation
- consumer contract
Wrong Approach
"The new name is better, so replace the old field."
Better implementation language does not erase existing consumer contracts.
Better Approach
Evolve additively:
{
"customer_name": "Amina Khan",
"display_name": "Amina Khan"
}
Deprecate with evidence:
announce
measure usage
provide migration guide
wait through support window
remove only after consumers are gone
Tradeoff Table
| Choice | Gain | Cost |
|---|---|---|
| rename in place | clean schema | breaks consumers |
| add new field | backward compatible | duplicate fields during transition |
| versioned endpoint | clean break | longer support burden |
| consumer-driven contract tests | catches breakage | needs consumer participation |
Failure Mode
An internal refactor becomes a public incident.
Required Artifact
Write a deprecation plan with announcement, telemetry, migration examples, support window, and removal gate.
Project / Capstone Connection
Capstone APIs should include compatibility rules for response fields.
Case Study 5: REST, gRPC, Or Events For A Fulfillment Integration
Scenario: Checkout must tell fulfillment about paid orders. One team wants REST, one wants gRPC, and one wants events. Each is reasonable under different forces.
Source anchor: Google Cloud's API design and gRPC documentation describe resource-oriented APIs and RPC contracts; AsyncAPI documents event-driven contracts for asynchronous systems. See Google API design guide, gRPC concepts, and AsyncAPI specification.
Module concepts:
- REST
- gRPC
- event contract
- synchronous vs asynchronous integration
- consumer ownership
Wrong Approach
"Pick the integration style the team likes."
The style should match latency, ownership, reliability, and coupling.
Better Approach
Compare operation semantics:
Need immediate answer from fulfillment?
use synchronous API
Need notify multiple subscribers after checkout?
publish event after transaction commits
Need low-latency internal RPC with strong schema?
gRPC may fit
Tradeoff Table
| Choice | Gain | Cost |
|---|---|---|
| REST | broad compatibility | weaker schema/streaming semantics |
| gRPC | strong schema and efficient RPC | client/tooling constraints |
| event | loose temporal coupling | eventual consistency and replay rules |
| webhook | external async callback | delivery and verification burden |
Failure Mode
Checkout waits synchronously for work that could be asynchronous, or publishes events for a workflow that actually needs immediate acceptance/rejection.
Required Artifact
Write an integration-style ADR comparing REST, gRPC, and events for one capstone workflow.
Project / Capstone Connection
Capstone integration choices should be justified by consumer needs and failure mode, not trend.
Source Map
| Source | Use it for |
|---|---|
| Stripe idempotent requests | idempotency keys and safe retries |
| Microsoft REST API best practices | REST design, idempotency, collections |
| GraphQL DataLoader | batching and caching resolver loads |
| Google AIP-180 | backward compatibility and breaking changes |
| Google API design guide | resource-oriented API design |
| gRPC concepts | RPC contracts and communication model |
| AsyncAPI specification | event-driven API contracts |
Completion Standard
- At least three artifacts are completed.
- At least one API has an idempotency contract.
- At least one list endpoint has a pagination contract.
- At least one breaking-change plan includes telemetry and deprecation.
- At least one integration-style ADR compares REST, gRPC, and events.