Module 1: Relational Databases & SQL

Primary text: Database System Concepts (Silberschatz, Korth, Sudarshan)
Selective support: Designing Data-Intensive Applications (Kleppmann) for modern architectural framing; Database Internals (Petrov) for occasional implementation peeks; PostgreSQL and SQLite official docs for exact syntax and semantics

This guide is the primary teacher. You do not need to read the source books front-to-back to complete this module. You do need to become operationally strong at the relational model, the SQL pipeline, schema design, constraints, and reading basic query plans before moving into storage engines, replication, and transactions.

Scope of This Module

This module is not a SQL cheat sheet. It is where data modeling becomes something you can defend, and where "SQL works" becomes "I can predict what SQL will do and how the engine will run it."

What it covers in depth:

the relational model as tuples over domains, relations as sets, and why the model has lasted for fifty years
keys (primary, candidate, foreign, surrogate vs natural) and why referential integrity is a feature, not a ceremony
relational algebra as the semantic core behind SQL
the SQL logical pipeline (FROM, WHERE, GROUP BY, HAVING, SELECT, ORDER BY) and why running order differs from reading order
JOINs (inner, outer, semi, anti) and set operations (UNION, INTERSECT, EXCEPT)
subqueries, CTEs, and window functions as the three ways SQL composes
functional dependencies and normal forms (1NF, 2NF, 3NF, BCNF), plus when denormalization is the right answer
ER modeling and translation to tables
constraints (NOT NULL, UNIQUE, CHECK, FOREIGN KEY, cascading) and how they protect data from your future self
schema migrations, including online migrations, reversibility, and blast-radius control
views, materialized views, and stored procedures
reading an EXPLAIN / EXPLAIN ANALYZE plan enough to diagnose a slow query
scan vs seek, the N+1 query pattern, and why a correct query can still be a bad query
transactions at the SQL surface: BEGIN, COMMIT, ROLLBACK, isolation levels preview

What it deliberately does not try to finish here:

B-tree and LSM-tree internals (Module 2)
replication, partitioning, and leader election (Modules 3 and 5)
isolation-level implementations and concurrency-control protocols (Module 4)
NoSQL, graph, and document databases beyond a bridging comparison

This is the relational foundation the rest of Semester 6 depends on.

Before You Start

Answer these closed-book before starting the main path:

Why is a relation a set of tuples and not a list, and what consequence does that have for SQL result semantics?
What is the difference between a primary key, a candidate key, and a surrogate key?
If an application issues SELECT ... FROM orders JOIN customers ON ... WHERE ..., in what logical order does the engine evaluate the clauses?
Given Orders(customer_id, total) and a query that returns the top spender per country, where would you reach for GROUP BY, a window function, or a subquery?
If a query is slow, what is the first thing you ask the database to show you before changing any code?

Diagnostic Interpretation

4-5 solid answers

You are ready for the full path.

2-3 solid answers

Continue, but expect extra time in the SQL pipeline and normalization clusters.

0-1 solid answers

Revisit Semester 1 Module 1 (sets, predicate logic) and Semester 2 Module 1 (algorithmic reasoning) briefly. The relational model inherits its structure from both.

What This Module Is For

Relational databases are still the default store for correctness-critical data, and SQL is still the language in which most data questions are asked in industry. Throughout your career you will repeatedly be asked:

can this schema answer every question we have without contortions?
will this query return the right answer, and will it still return it in six months?
why is this query slow, and is it the query, the schema, or the index?
can two transactions step on each other, and what is the contract with the caller?
how do we add a column to a billion-row table without downtime?

This module builds the relational reasoning needed for:

storage engines and indexes (Module 2)
replication and partitioning (Module 3)
transactions and consistency (Module 4)
distributed systems tradeoffs, where "the database" often becomes several cooperating databases (Module 5)
backend engineering, where data correctness usually matters more than framework taste

You are learning to think about data without handwaving.

Concept Map

How To Use This Module

Work in order. The later clusters only make sense if the earlier modeling and query habits are stable.

Cluster 1: The Relational Model

Order	Concept	Type	Focus
1	Relations, Tuples, Attributes, Domains	PRIMARY	Codd's model: tables as sets of tuples over domains
2	Keys: Primary, Candidate, Foreign, Surrogate vs Natural	PRIMARY	Identity, uniqueness, and referential integrity
3	Relational Algebra: Selection, Projection, Join, Union	PRIMARY	The closed algebra that gives SQL its semantics

Cluster mastery check: Can you take an English question, translate it into relational algebra, and then defend why your SQL matches that algebra?

Cluster 2: SQL as a Query Language

Order	Concept	Type	Focus
4	The SQL Logical Pipeline: SELECT/FROM/WHERE/GROUP BY/HAVING/ORDER BY	PRIMARY	Clause evaluation order and why it differs from reading order
5	JOINs and Set Operations: Inner, Outer, Semi, Anti	PRIMARY	Every join as a filter on a Cartesian product, plus UNION/INTERSECT/EXCEPT
6	Subqueries, CTEs, and Window Functions	PRIMARY	The three ways SQL composes queries

Cluster mastery check: Given an English business question, can you write a correct SQL query, say in what order its clauses run, and name at least one equivalent rewrite?

Cluster 3: Schema Design and Normalization

Order	Concept	Type	Focus
7	Functional Dependencies and Normal Forms: 1NF, 2NF, 3NF, BCNF	PRIMARY	FDs, closure, decomposition, and lossless-join design
8	Denormalization: When and Why	SUPPORTING	Read-path tradeoffs, redundancy vs performance
9	ER Modeling and Translation to Tables	PRIMARY	Entities, relationships, cardinalities, and disciplined translation

Cluster mastery check: Given a domain description, can you produce an ER diagram, translate it to 3NF tables, and explain every denormalization you chose to keep?

Cluster 4: Constraints, Integrity, and DDL

Order	Concept	Type	Focus
10	Constraints: NOT NULL, UNIQUE, CHECK, FOREIGN KEY, Cascading	PRIMARY	Integrity enforced by the database rather than hoped for by the app
11	Schema Migrations: Adding/Removing Columns, Online Migrations	PRIMARY	Reversible, forward-compatible, low-blast-radius DDL
12	Views, Materialized Views, and Stored Procedures	SUPPORTING	Abstraction, precomputation, and server-side logic

Cluster mastery check: Can you add, remove, or backfill a column on a large table without downtime, and can you say what every FOREIGN KEY ... ON DELETE clause will do under stress?

Cluster 5: SQL Performance Basics

Order	Concept	Type	Focus
13	Reading an EXPLAIN / EXPLAIN ANALYZE Plan	PRIMARY	Nodes, rows, costs, and the estimated-vs-actual discipline
14	The Cost of a Bad Query: Scan vs Seek, N+1	PRIMARY	Where queries go wrong and how you recognize it early
15	Transactions at the SQL Level: BEGIN, COMMIT, ROLLBACK, Isolation Preview	SUPPORTING	The SQL-surface view of ACID, leaving deep mechanics for Module 4

Cluster mastery check: Given a slow query and its EXPLAIN ANALYZE output, can you point to the expensive node, say why it is expensive, and propose either an index, a rewrite, or a schema change?

Then work these practice pages:

Order	Practice path	Focus
1	Relational Algebra to SQL Workshop	Pipeline order, joins, subqueries, CTEs, and window functions
2	Schema Design Lab	Sample domain design, keys, and ER-to-SQL translation
3	Query Performance Clinic	`EXPLAIN`, scan-vs-seek, and migration-side query drills
4	SQL Code Katas	Timed SQL and schema drills

Use Module Quiz after the concept and practice path. Use Reference and Selective Reading and Learning Resources only for targeted reinforcement.

Learning Objectives

By the end of this module you should be able to:

Define a relation formally and explain why relational results are sets, not lists.
Distinguish primary, candidate, foreign, surrogate, and natural keys, and justify a choice in a real schema.
Translate an English question into relational algebra and then into SQL, and explain why both match.
State the logical evaluation order of a SQL query and use it to predict results and errors.
Write and read inner, outer, semi, and anti joins, plus UNION, INTERSECT, and EXCEPT, with correct semantics for NULLs and duplicates.
Use subqueries, CTEs, and window functions, and choose the simplest composition that solves a given problem.
Identify functional dependencies, decompose a schema to 3NF or BCNF, and justify any denormalization.
Produce an ER diagram for a non-trivial domain and translate it to tables with complete constraints.
Write DDL with NOT NULL, UNIQUE, CHECK, and FOREIGN KEY (including ON DELETE / ON UPDATE) that prevents invalid states.
Read an EXPLAIN ANALYZE plan, diagnose scan-vs-seek and N+1 issues, and propose targeted fixes.

Outputs

a relational-modeling notebook with at least 3 end-to-end domain designs (ER diagram, schema DDL, written tradeoff notes)
a SQL catalog with at least 25 queries covering joins, subqueries, CTEs, window functions, and set operations, each with its own test data and expected output
a normalization sheet showing at least 3 schemas walked from 1NF through BCNF with decomposition steps
a constraints gallery including at least 8 CREATE TABLE statements that exercise every constraint type and every cascade mode
a migration log of at least 4 online-migration playbooks (add column, drop column, rename, backfill), each with rollback plan
an EXPLAIN journal with at least 10 before/after query-plan comparisons
a mistake log naming at least 10 recurring SQL or design errors (GROUP BY wrong columns, NULL != NULL confusion, implicit cross join, forgotten cascade, 1+N query, etc.)
one short memo connecting Module 1 habits to the indexing, transaction, and distributed-system modules ahead

Completion Standard

You have completed Module 1 when all of these are true:

you can model a new domain as an ER diagram and defend the schema it produces
you can write a non-trivial SQL query and predict its result shape and types before running it
you can explain the logical evaluation order of any SQL statement you write
you can normalize a table to 3NF and say, concretely, which anomalies each normal form eliminated
you can add a constraint to a running system without corrupting existing data
you can read an EXPLAIN ANALYZE plan and propose at least one fix for a slow query
you can run a transaction and say, in words, what would happen to other sessions at READ COMMITTED versus SERIALIZABLE

If a query "returns what you expected on the sample data" but you cannot say why under different data, the module is not complete.

Reading Policy

Concept pages are the main path.
Local book chunks are selective reinforcement, not a second syllabus.
PostgreSQL and SQLite official docs are authoritative for exact syntax and semantics; reach for them when the question is "what does this do" rather than "what does this mean".
Read only if stuck means try the concept page, self-check, and drill first.
Optional deep dive means additional nuance or exercise volume, not required progression.
Because this is the foundation for the rest of Semester 6, written modeling justifications and query explanations are required, not optional enrichment.

Suggested Weekly Flow

Day	Work
1	Concepts 1-3, translate 5 English questions into relational algebra
2	Concepts 4-5, write 8 queries across JOINs and set operations
3	Concept 6, rewrite 4 subquery patterns as CTEs and then as window functions
4	Concepts 7-9, normalize one given schema to 3NF with written justification
5	Concepts 10-12, build one constraint gallery on your own schema
6	Concepts 13-15, run `EXPLAIN ANALYZE` on 5 queries and annotate the output
7	Practice 1 (lab) and Practice 2 (workshop)
8	Practice 3 (clinic), online-migration drill
9	Practice 4 (code katas), first quiz attempt
10	Quiz remediation and mistake-log cleanup

Reference

If you need exact links into the local chunked books, use Reference and Selective Reading.

Build Your Own X — elective

The Database (Relational) tutorial builds a SQLite-clone in C: parser, planner, executor, B+ tree storage. The natural deep-dive companion to this module. See Build Your Own X overview.

Rich Learning Pages

Model Artifact Calibration

For database performance evidence, compare your analysis to the query-plan review model artifact.

Scope of This Module​

Before You Start​

Diagnostic Interpretation​

What This Module Is For​

Concept Map​

How To Use This Module​

Cluster 1: The Relational Model​

Cluster 2: SQL as a Query Language​

Cluster 3: Schema Design and Normalization​

Cluster 4: Constraints, Integrity, and DDL​

Cluster 5: SQL Performance Basics​

Learning Objectives​

Outputs​

Completion Standard​

Reading Policy​

Suggested Weekly Flow​

Reference​

Rich Learning Pages​

Model Artifact Calibration​