Batch Operations, Streaming, and Bulk Patterns
What This Concept Is
Most APIs need to act on many items occasionally. Three patterns cover the common cases:
- Batch operations: one request, many items, one response with per-item status. Good for "add 50 users" or "delete all matching tickets."
- Streaming responses: one request, a response that is emitted as it is produced (HTTP chunked, gRPC streaming, Server-Sent Events). Good for "export 10M rows" or "tail live logs."
- Import / export: bulk transfer via a file in object storage. Good for "move millions of records in or out."
Each has a different atomicity contract, a different failure mode, and a different fit. Picking the wrong one is how APIs end up with POST /deleteAllUsers endpoints that nobody is sure about.
Why It Matters Here
List endpoints scale to "I want to see the next 50." Batch-class patterns scale to "I want to process 50,000." They come with real contract decisions:
- if the first 10 of 50 succeed and the next 40 fail, is the batch partially committed or fully rolled back?
- if export takes an hour to produce 12GB, how do clients resume on network failure?
- if streaming is interrupted at row 700,000, can the client restart without duplicates?
Without an explicit pattern, each team invents a different answer, and consumers pay.
Concrete Example
Batch: multi-item with per-item results
Non-atomic batch (each item succeeds or fails independently; partial success is fine):
POST /tickets:batchCreate
Content-Type: application/json
{
"tickets": [
{ "subject": "Printer jammed", "priority": "P3" },
{ "subject": "Email broken", "priority": "P1" },
{ "subject": "", "priority": "P2" }
]
}
HTTP/1.1 207 Multi-Status
Content-Type: application/json
{
"results": [
{ "index": 0, "status": "CREATED", "id": "t_1" },
{ "index": 1, "status": "CREATED", "id": "t_2" },
{ "index": 2, "status": "FAILED",
"error": { "code": "VALIDATION_FAILED",
"message": "subject is required" } }
]
}
Document up front: "207 Multi-Status indicates at least one item was processed; per-item status must be inspected."
Atomic batch (all or nothing):
POST /transfers:batchCreate
{
"atomic": true,
"transfers": [ {...}, {...}, {...} ]
}
HTTP/1.1 400 Bad Request
{ "code": "ATOMIC_BATCH_FAILED",
"message": "item at index 2 failed validation; no transfers were created",
"errors": [ { "index": 2, "code": "INSUFFICIENT_FUNDS" } ] }
Spec rule: make atomicity explicit (atomic: true or a separate endpoint). Consumers should never be guessing whether their money transfer is half-committed.
Bound the batch size. max_items: 100 is a reasonable default; reject larger batches with 413 Payload Too Large or a specific BATCH_TOO_LARGE code. Large batches should use the import/export pattern instead.
Streaming: large read responses
For exports that do not fit comfortably in one response, use HTTP chunked transfer with a line-delimited JSON body (NDJSON), or gRPC server-streaming:
GET /orders:stream?filter=created_after%20%222026-01-01%22
Accept: application/x-ndjson
HTTP/1.1 200 OK
Content-Type: application/x-ndjson
Transfer-Encoding: chunked
{"id":"ord_1","total":1599,"created_at":"..."}
{"id":"ord_2","total":8999,"created_at":"..."}
{"id":"ord_3","total":2499,"created_at":"..."}
...
NDJSON is recoverable - a client parsing line-by-line can resume from a checkpoint. Consumers must document how they resume on disconnection (cursor in URL, last ID in header).
gRPC version (contract-level):
service OrderService {
rpc StreamOrders(StreamOrdersRequest) returns (stream Order);
}
Import / export: files in object storage
For really large data, don't stream - use files.
POST /orders:export
{
"format": "ndjson.gz",
"filter": "created_after \"2026-01-01\"",
"destination": { "bucket": "client-acme", "key": "exports/2026-04.ndjson.gz" }
}
HTTP/1.1 202 Accepted
Location: /operations/op_export_17
This is an LRO (Cluster 3 concept 8) that, on completion, returns a file URL. Consumers download the file; the API never streams 12GB through itself.
POST /orders:import
{
"format": "ndjson.gz",
"source": { "bucket": "client-acme", "key": "imports/2026-04-in.ndjson.gz" },
"id_collision": "fail" // or "overwrite" or "skip"
}
id_collision is a contract decision Geewax calls out - consumers need to know how the server treats duplicates.
Common Confusion / Misconception
"Batch is just a loop server-side." It is when it comes to implementation. It is not when it comes to contract: atomicity, per-item error handling, and size limits are all separate decisions from "add one ticket."
"We should make every endpoint a batch endpoint." No. Only endpoints with a clear bulk use case should be batch. Over-batching adds complexity for consumers who only want one item.
"Streaming is just for large payloads." Streaming is also for incremental results (log tailing, search results arriving as they are scored). The contract is "a sequence of items" whether there are 10 or 10M.
"Import/export is the same as batch." Export is N items going out; batch is N items going in and returning results in one response. For imports that are too big for a single request body, use file-based import as an LRO instead.
How To Use It
When designing a bulk-style endpoint:
- Pick the pattern: batch (single request, single response) / streaming (single request, streamed response) / import-export (file via storage + LRO).
- State atomicity: all-or-nothing, best-effort, or opt-in.
- Cap the size: document the max items or payload size and the error for exceeding it.
- Define per-item status: the response shape when some items fail and others succeed.
- Define resumability: for streams and exports, how does the consumer restart on failure?
- Define duplicate handling: what happens if the same item appears twice?
Check Yourself
- Why should an atomic batch be opt-in rather than the default?
- A consumer needs to export 40GB of orders. Batch, streaming, or import/export? Why?
- A batch endpoint returned
207 Multi-Statuswith some failed items. The consumer wants to retry only the failures. What does your response shape need to include for that to be trivial?
Mini Drill or Application
Design three endpoints for a ticketing system:
POST /tickets:batchCreate- batch with opt-in atomicity,max_items=100, per-item results.GET /tickets:stream- NDJSON stream with cursor-resume on disconnect.POST /tickets:export- LRO producing a file; include format, filter, destination bucket shape.
Write request and response examples for each. Note one failure mode per endpoint with its contract response.
Read This Only If Stuck
- Geewax: Batch operations - motivation and overview
- Geewax: Batch - operating across parents, atomicity
- Geewax: Batch Get, Batch Create
- Geewax: Batch Update and tradeoffs
- Geewax: Import and export - motivation
- Geewax: Converting between resources and bytes, consistency
- Geewax: Import/export - failures and retries