Data Model Design

What This Concept Is

Data model design for a system under scale is four decisions, made in this order:

Entities: what are the nouns of the domain? Each gets a primary key and a schema.
Relationships: which entities reference which? 1:1, 1:N, N:M? Enforced or denormalized?
Indexes: which fields are queried? Which combinations? What is the cardinality and selectivity of each?
Partitioning key: in a distributed store, which field determines which shard a row lives on? This is the single most load-sensitive decision in the design.

At this stage you write table sketches, not DDL. Columns with types, primary key, partitioning key, and the 2-3 indexes that matter. If the problem requires a relational model, draw the ER sketch; if it is wide-column, draw per-partition row layout; if it is document, draw one sample document with its access patterns annotated.

An access-pattern-first discipline, drawn from the AWS DynamoDB modeling guide: enumerate the queries before the schema. In SQL you can get away with normalizing first and adding indexes later, because the query planner rewrites your queries. In wide-column and KV stores, the schema is the index, and retrofitting a new access pattern often means introducing a second denormalized copy of the data kept in sync by events. Saying "we will figure out queries later" is fine for Postgres and lethal for DynamoDB or Cassandra.

Why It Matters Here

Data modeling mistakes compound. They cost rewrites, migrations, hot-partition incidents, and correctness bugs that data auditors find months later.

Wrong entity boundary -> application-level joins and consistency bugs.
Wrong index -> 100× slower reads or 10× slower writes.
Wrong partitioning key -> hot shards, melted nodes, rebalancing nightmares.
Wrong relationship denormalization -> divergent truth across tables.

Cluster 4 (scale) often comes down to "your partitioning key was wrong"; catching it here saves the session.

Social feed, Timeline store (wide-column, e.g., Cassandra):

Table: user_timeline
  partition_key: user_id           -- all of one user's feed lives on one shard
  clustering_key: created_at DESC  -- within the partition, sorted newest-first
  columns:
    post_id            uuid
    author_id          bigint
    post_kind          enum('text','image','retweet')
    summary            text         -- denormalized for list rendering
    author_handle      text         -- denormalized; avoids a join
    created_at         timestamp

Typical read:
  SELECT * FROM user_timeline
   WHERE user_id = :uid
     AND created_at < :cursor
   LIMIT 50;

Annotations:

Entities: User, Post, TimelineEntry. Users and Posts live in other tables; TimelineEntry is a derived entity maintained by fan-out.
Relationships: TimelineEntry references Post (by post_id) and User (by user_id as owner, author_id as denorm). There is intentional duplication across user partitions -- one post becomes N timeline entries.
Indexes: the clustering key created_at DESC is the index for the pagination query. No secondary index is needed for this table.
Partitioning key: user_id distributes read load evenly across users. An alternative, author_id, would have concentrated load on celebrity authors -- exactly the wrong shape.

Contrast: Posts store (same design):

Table: posts
  partition_key: author_id, bucket  -- (author_id, floor(created_at/day))
  clustering_key: created_at DESC
  columns:
    post_id       uuid
    body          text
    media_ref     text
    created_at    timestamp

Here, author_id plus a time bucket avoids unbounded partition growth for prolific authors -- a common hot-partition fix. Without the bucket, an active author's partition grows forever, which wide-column stores handle poorly.

For a URL shortener:

Table: shorts (wide-column or KV)
  partition_key: short_code           -- primary access is by code
  columns:
    long_url         text
    owner_user_id    bigint nullable
    created_at       timestamp
    expires_at       timestamp nullable

Index for "my short links" query:
Table: shorts_by_owner
  partition_key: owner_user_id
  clustering_key: created_at DESC
  columns: short_code

Notice how the secondary access pattern ("list my links") becomes a second table with its own partition key. This is standard in wide-column and NoSQL design: you design one table per access pattern.

Concrete Example 2: Chat Messages

Table: messages
  partition_key: (conversation_id, day_bucket)  -- one day of one chat
  clustering_key: (timestamp ASC, message_id)
  columns:
    sender_id      bigint
    body           text
    attachments    list<uri>
    client_msg_id  uuid     -- for idempotency
    status         enum

Why this shape:

Partitioning by conversation_id alone seems obvious and fails on long-running chats (group chat over years -> unbounded partition).
Adding day_bucket puts each day of a chat in its own partition. Typical reads ("last 50 messages in this chat") hit the current bucket; "load older" moves to the previous bucket. Partition size is bounded by messages_per_day × day_length.
Clustering ASC matches forward-scroll reads; chat UIs typically read oldest-of-the-page forward, then cache client-side.
client_msg_id supports idempotency. When a client retries a send after a network blip, the server deduplicates by this key -- a common exactly-once approximation.

Access patterns this table serves:

Pattern	Read path
Last N messages	`WHERE conversation_id=? AND day_bucket=today ORDER BY timestamp DESC LIMIT 50`
Scroll older	Next bucket back; paginate
Message delivery receipt	Update status column (partition-local, cheap)

Access patterns this table does NOT serve cheaply:

Pattern	Why not
"Search messages by text in a conversation"	Needs a separate search index (OpenSearch, driven by an async consumer)
"All messages sent by user X across all conversations"	Second table `messages_by_sender` partitioned by `sender_id`

The last row is the important one -- NoSQL data modeling is a table-per-access-pattern discipline, and being explicit about which patterns each table does not serve prevents later surprise.

Common Confusion / Misconceptions

"Partitioning key is whatever the primary key is." No. The primary key uniquely identifies a row; the partition key controls where the row lives. In some stores they are the same; in others they differ (composite keys). Confusing the two creates either hot shards or impossible queries.

"Denormalization is bad." It depends on the store. In SQL, normalize first and denormalize only when you must. In wide-column and document stores, denormalization is the native model; joins at read time are the anti-pattern.

"One table covers all access patterns." In SQL with appropriate indexes, often yes. In NoSQL, almost never. Design one (materialized) view per access pattern and keep them updated via events or triggers.

"Indexes are free." Every secondary index is a write-path tax and a storage cost. Three well-chosen indexes beat twelve speculative ones.

"Buckets are a micro-optimization." No. Unbounded partitions are a foot-gun unique to wide-column and KV stores; a time-bucket is a standard cure and you should reach for it reflexively on append-only tables.

How To Use It

Per primary entity:

State the entity in one sentence.
Write the column sketch: field names, types, nullability.
Pick the primary key. Circle it.
Pick the partitioning key. Circle it separately.
List the 2-3 indexes (secondary tables in NoSQL; real indexes in SQL).
Write the 2-3 queries this entity serves, using those indexes.
Ask: is any partition unbounded? If so, add a bucket.

Repeat per entity. Three to five entities is typical.

Transfer / Where This Shows Up Later

Cluster 3 concept 9 (consistency) attaches a concurrency/consistency contract to each of these mutable tables.
Cluster 4 concept 10 (scale) audits the partitioning key at 10× and 100×.
Cluster 4 concept 12 (SPOFs) often finds that an unbounded partition is a soft SPOF.
S8M3 (data patterns) is the deep module on this -- it covers multi-model persistence, CDC, event sourcing.
S9 (cloud) maps partitioning keys to DynamoDB partition keys + sort keys; Aurora/Postgres logical sharding; Spanner interleaved tables.
S10 capstone/interviews: partitioning-key questions are the single most common follow-up after the box diagram; a wrong answer here unwinds the rest of the design.

Check Yourself

In the timeline table above, what goes wrong if we partition by created_at instead of user_id?
Why is (author_id, day_bucket) a safer partitioning key than author_id alone for an append-only post table?
What is the cost of adding a fifth secondary index on a write-heavy table?
Name one access pattern your chat messages table above does not serve and the second table you'd add for it.

Mini Drill or Application

For each workload, produce column sketches with primary key, partitioning key, and 2-3 indexes:

Chat messages per conversation (1 M conversations, 100 msgs/day each).
Orders per merchant (10 K merchants, up to 10 K orders/day each).
IoT sensor readings (1 M devices, 1 reading/sec each).
Per-user audit log (1 B users, a few events/day each).

For each, name one partitioning key that looks tempting and explain why it is wrong.

Read This Only If Stuck

System Design Primer: Database federation and sharding -- primary reference for partition-key choice.
System Design Primer: NoSQL databases -- KV, document, wide-column, graph native access patterns.
System Design Primer: SQL or NoSQL -- decision table.
System Design Primer: Database RDBMS replication -- replication implications of the primary-key choice.
Fundamentals: Database partitioning -- logical vs physical partitioning.
Fundamentals: Connascence -- denormalization is a kind of data-level connascence; track it explicitly.
AWS -- NoSQL design for DynamoDB -- definitive access-pattern-first modeling guide.
Martin Fowler -- Catalog of Patterns of Distributed Systems -- Fixed Partitions, Key-Range Partitions, Segmented Log all apply here.
High Scalability -- case studies where the wrong partition key caused an outage.

What This Concept Is​

Why It Matters Here​

Concrete Example: Social Feed Timeline​

Concrete Example 2: Chat Messages​

Common Confusion / Misconceptions​

How To Use It​

Transfer / Where This Shows Up Later​

Check Yourself​

Mini Drill or Application​

Read This Only If Stuck​