Choose a Storage Approach

What This Concept Is

Choosing a storage approach is not "SQL vs NoSQL". It is a four-question conversation you hold with yourself on the board:

What is the access pattern? Point lookups by primary key? Range scans? Secondary-index queries? Joins? Full-text search? Time-series append-and-scan?
What is the consistency contract? Must a read immediately see a prior write? Tolerate staleness? Tolerate conflict resolution?
What is the size and growth? Does the hot set fit in RAM? Does the total fit on one machine? Is growth predictable?
What is the durability and failure target? Synchronous cross-AZ replication? Cross-region? Point-in-time recovery?

The storage choice (SQL, document, key-value, wide-column, search, time-series, object store, graph, or a hybrid of these) falls out of the four answers. The choice is never "because I like it".

A senior mental model: polyglot persistence (Martin Fowler's term). One system uses multiple storage technologies because one technology cannot natively serve all the access patterns at the required cost and latency. The pushback against "one DB for everything" is not dogma -- it is what happens when you honestly enumerate access patterns and discover they have conflicting native shapes (OLTP vs OLAP, transactional vs full-text, hot-KV vs analytics). The cost of polyglot is operational (more tech to run, more failure modes); the cost of monolithic storage is architectural (forcing workloads onto stores that fight them). Your job is to say which cost you are taking and why.

Why It Matters Here

Storage is usually the hardest-to-change decision in a design. Code is ephemeral, deployments roll, services get rewritten. Data is forever.

Pick SQL when you should have picked a wide-column store, and you will pay with manual sharding, hot-partition incidents, and complex migrations.
Pick a document store when you should have picked SQL, and you will reinvent joins, constraints, and schema evolution badly.
Pick one store when you should have picked two, and you will force one component's needs on another's workload.

Cluster 3 (data model) and Cluster 4 (scale) both depend on this being right.

For our social feed system:

Workload	Access pattern	Best fit	Why
User accounts, profiles	Point lookup by user ID; joins with friends/followees; small per-row; strong consistency on account changes	SQL (Postgres/MySQL) with sharding by user ID	Relational structure + transactions for signup/auth is worth more than raw scale
Posts	Append-only writes; read by post ID, or by author ID over a time range	Wide-column (Cassandra) or NoSQL KV sharded by author ID, clustering by `created_at desc`	High write throughput; range scans per author are the native access pattern
Timeline per user	Billions of users × recent items; reads dominated by "give me last 100 items for user X"; write amplification on fan-out	Wide-column sharded by user ID, clustering by `created_at desc`; hot tail cached in Redis	Keyed read of the tail is the only op; SQL joins would be catastrophic
Media blobs	Infrequent write, heavy read, static bytes	Object store (S3) behind a CDN	Purpose-built; SQL or KV would be a waste of money and latency
Search (by text)	Inverted-index queries, scoring, filters	Search engine (Elasticsearch/OpenSearch) indexed from the primary store	No OLTP DB does this well at scale
Metrics / analytics events	Very high-write, append-only, aggregate-only reads	Kafka -> columnar warehouse (ClickHouse, BigQuery, Snowflake)	OLAP shape; OLTP stores collapse under this

This is a polyglot persistence design. Each store is chosen for its native workload; the primary is SQL or the wide-column store, and other stores are updated via event streams.

For a URL shortener, by contrast, a single wide-column or KV store is usually enough, because the access pattern is a single point lookup and writes are modest.

Concrete Example 2: E-Commerce Checkout

Four workloads, four stores:

Workload	Native shape	Chosen store	Rejected alternative
Product catalog	Read-heavy, point lookup by SKU, some faceted search	Document store (MongoDB) with SKU as primary key + search engine for faceted search	Rejected SQL-only because faceted search on LIKE clauses performs terribly
Inventory	Strict consistency required -- must not oversell; high contention on popular SKUs at sale time	SQL (Postgres) with row-level locking, sharded by SKU	Rejected Redis counter because audit/recovery is harder and audit/regulatory requirement demands a transactional trail
Orders	Write-heavy during peak; read by user_id and by order_id; one year of history online	SQL (Postgres) primary, sharded by `customer_id`; one-year retention hot, archived to object store after	Rejected wide-column because reporting and admin queries need ad-hoc SQL
Session / cart	High read-write, ephemeral, TTL-expiring	Redis (single source for cart during session) + async write-through to durable store on checkout	Rejected SQL because TTL-based eviction and sub-ms latency don't match SQL workload

The dominant architectural point: different workloads, different stores, orchestrated by events. The primary store for money-touching operations stays transactional SQL; everything else is chosen for its native access pattern.

Common Confusion / Misconceptions

"NoSQL scales; SQL does not." False. Modern Postgres and MySQL handle millions of writes/sec with sharding and proper schema. "NoSQL scales" shortcut is usually "I do not want to think about schema". Both scale; they scale differently.

"We can use one database for everything." You can; you usually should not. Forcing analytics on an OLTP store will either cripple the OLTP store or cripple the analytics. Forcing full-text search on SQL's LIKE queries is a known incident factory.

"Strongly consistent is always better." No. Strong consistency usually means synchronous cross-AZ writes, which raises P99 and reduces availability. Choose strong consistency where the business requires it (money, identity, auth) and eventual consistency where the business can tolerate it (counters, feeds, search).

"Partitioning key is a detail." Partitioning key is the storage decision. See Cluster 3 concept 8. Picking the wrong partitioning key will rebuild itself as "our hot shard is melting under celebrity load" in Cluster 4.

"Polyglot persistence is over-engineering." Sometimes. For a small system, a single Postgres really is the right answer. Polyglot justifies itself when one workload's requirements are structurally incompatible with another's (full-text + OLTP, high-write time-series + ad-hoc SQL). Name the incompatibility before adding a store.

How To Use It

A seven-line storage sketch per workload:

Workload name (e.g., "Timeline").
Access pattern in one sentence.
Consistency requirement in one phrase ("read-your-write per user", "bounded staleness", "strict serializable").
Rough size today and in 1-2 years.
Chosen store family (SQL / wide-column / KV / document / search / object / time-series / graph).
Partitioning key (if distributed).
One sentence naming the rejected alternative and the reason.

Do this once per workload. Three to five workloads is typical. A good design has at most three primary stores; extras are specialized read models fed by events.

Transfer / Where This Shows Up Later

Cluster 3 concept 8 (data model) refines the partitioning-key decision per store.
Cluster 3 concept 9 (consistency) makes the consistency contract explicit per mutable table.
Cluster 4 concept 10 (scale) audits whether the chosen store survives 10× and 100×.
S8M3 (data patterns) is an entire module on the pros/cons of each store family, with deeper coverage of polyglot architectures.
S9 (cloud) maps store families to managed services (RDS/Aurora for SQL, DynamoDB/Cassandra for KV/wide-column, S3 for blobs, OpenSearch/Elasticsearch for search).
S10 capstone/interviews: the single most-asked follow-up is "why this database?" -- a good seven-line sketch per workload answers it before the question.

Check Yourself

Name one workload where Postgres beats DynamoDB and one where DynamoDB beats Postgres. Why, in one sentence each?
Why does a wide-column store usually require you to know the query shape before the schema, while SQL lets you query flexibly post-hoc?
What would go wrong if a social feed's timeline was stored in a single SQL table with WHERE user_id = ? as the read query, at 1M reads/sec?
Defend a polyglot choice using two workload incompatibilities.

Mini Drill or Application

For each prompt, produce a seven-line storage sketch per workload, in under ten minutes total:

Chat system: user accounts, message store, unread-count per conversation, message search.
Ride-share: users, live driver location, trip history, pricing audit log.
E-commerce: catalog, inventory, orders, reviews, recommendations.

Then explicitly name one workload that would tempt you into the wrong choice if you hurried, and why.

Read This Only If Stuck

System Design Primer: SQL or NoSQL -- short decision table.
System Design Primer: Database RDBMS replication -- required reading for the "why replicas" conversation.
System Design Primer: Database federation and sharding -- essential before 100× sizing conversations.
System Design Primer: NoSQL databases -- KV, document, wide-column, graph in one place.
System Design Primer: Consistency patterns -- strict, eventual, weak.
System Design Primer: CAP theorem -- the partition-behaviour framing most store choices react to.
Fundamentals: Database partitioning -- logical vs physical partitioning.
Martin Fowler -- Polyglot Persistence -- the canonical argument for multi-store architectures.
AWS -- NoSQL design for DynamoDB -- how access-pattern-first modeling works on a real managed store.
Jepsen -- Consistency -- definitive reference for what each consistency level actually guarantees.

What This Concept Is​

Why It Matters Here​

Concrete Example: Social Feed (Polyglot Persistence)​

Concrete Example 2: E-Commerce Checkout​

Common Confusion / Misconceptions​

How To Use It​

Transfer / Where This Shows Up Later​

Check Yourself​

Mini Drill or Application​

Read This Only If Stuck​