Semester 6: Databases & Distributed Systems

Year 3 -- Data & Architecture | Phase 6 | Weeks 60-69 | 10 weeks

This semester shifts from system components to data-bearing systems. The goal is not to memorize database terms, but to reason about how data is modeled, stored, replicated, protected, and recovered when the real world is messy.

You should leave this semester able to explain why a schema works, why an index helps or hurts, why a transaction fails, and why a distributed data system makes one tradeoff instead of another.

Goal

Build working judgment around relational modeling, query behavior, storage engines, replication, transactions, and distributed failure. By the end of the semester, you should be able to move from application-level data access to infrastructure-level reasoning about durability, correctness, and availability.

Prerequisites

You are ready for this semester if you can already:

read and write medium-sized programs comfortably
reason about time and space complexity
explain processes, memory, files, and concurrency at a systems level
describe client/server communication, latency, and partial failure at a networking level
work from code, notes, and diagrams without depending on tutorials for every step

Phase Completion Contract

Explain: normalization, indexing tradeoffs, replication strategies, isolation levels, and distributed failure models in your own words.
Build: a data-backed service or analysis project with schema design, query evidence, and explicit tradeoff documentation.
Diagnose: slow queries, poor data modeling, stale reads, contention, and failure scenarios without hand-waving.
Document: architecture notes, SQL artifacts, query-plan evidence, and consistency decisions in a form another engineer could review.
Do not advance if: you still treat databases as black boxes or cannot connect data-model decisions to operational consequences.

Modules

#	Module	Focus
1	Relational Databases & SQL	Relational modeling, schema quality, joins, aggregates, and query correctness
2	Storage Engines & Indexing	B-trees, LSM trees, pages, compaction, buffer pools, and workload-aware indexing
3	Replication & Partitioning	Leaders and followers, lag, sharding, rebalancing, and operational failure modes
4	Transactions & Consistency	Isolation, anomalies, recovery, consistency guarantees, and correctness under contention
5	Distributed Systems Fundamentals	Time, coordination, failure detectors, quorum intuition, and tradeoff-oriented distributed reasoning

Core Resources

Book	Role
Database System Concepts	Primary foundation for the relational model, SQL, transactions, and storage basics
Database Internals	Mechanistic depth for pages, storage engines, indexing, and replication internals
Designing Data-Intensive Applications	Architecture-level treatment of data systems, consistency, replication, and partitioning
Distributed Systems: Concepts and Design	Distributed systems vocabulary, failure models, and coordination foundations

Use the local semester books/ packs first. Use external references only when a module explicitly needs current implementation docs or a canonical clarification.

Cross-Cutting Tracks Active This Semester

Track	Level	Focus This Semester
A: Testing and Verification	3	Database tests, transaction-focused edge cases, and evidence that queries and invariants hold under change
B: Git, Code Review, and Delivery	4	Clear commits for schema evolution, reviewable migrations, and explicit rollback thinking for data changes
C: Security Engineering	3	Secrets handling, data classification, least privilege, and safe defaults for stored and in-flight data
D: Observability and Reliability	2	Query timing, error rates, lag, contention, and the signals that tell you when the data plane is unhealthy
E: Engineering Fundamentals	4	Tradeoff writing, experiment discipline, and mechanical reasoning instead of database folklore

Weekly Arc

Week	Focus	Modules
1	Relational model, entities, constraints, and schema quality	M1
2	Query formulation, joins, aggregates, and query-reading discipline	M1
3	Pages, buffer pools, and why storage layout affects runtime behavior	M2
4	Index structures, workload fit, and engine tradeoffs	M2
5	Leader-follower replication, lag, failover, and read-path consequences	M3
6	Partitioning, hotspots, rebalancing, and scaling pressure	M3
7	Transactions, anomalies, locking, MVCC, and recovery intuition	M4
8	Consistency models and the cost of stronger guarantees	M4
9	Time, coordination, failure detectors, and distributed reality	M5
10	Integration, project synthesis, checkpoint, and exam preparation	M5 + project

Study Rhythm

Each week should include:

one deep reading session from a primary semester text
one written note that translates the reading into your own operational language
one artifact that produces evidence: schema, SQL, benchmark note, failure analysis, or design matrix
one review block where you revisit prior-semester fundamentals that this semester depends on

This semester benefits from drawing diagrams by hand. Use sequence diagrams, page-layout sketches, transaction timelines, and replication flows instead of relying only on prose.

Weekly Learning Journal Prompts

Which data modeling or consistency choice this week felt unintuitive at first, and what changed your mind?
Where did implementation detail matter more than abstraction: storage layout, query planning, replication lag, or transaction behavior?
What failure mode would hurt users most in your current project, and how would you detect it early?

Semester Deliverables

All module quizzes completed
All exercises and implementation labs completed
Reading notes written for each module
Query plans, schema decisions, or tradeoff notes captured in the repo
Semester project completed
Checkpoint gate passed
Cumulative review completed
Semester exam completed

Capstone Throughline

Every semester must leave behind evidence that can survive into the final capstone defense.

Artifact carried forward: data-backed app and query-plan reviews.
What to preserve: Preserve schema decisions, data-access code, query plans, indexing experiments, and the reasoning used to validate persistence tradeoffs.
Module threads: Module 1: Relational Databases & SQL, Module 2: Storage Engines & Indexing, Module 3: Replication & Partitioning, Module 4: Transactions & Consistency, and Module 5: Distributed Systems Fundamentals.
Defense prompt: In Semester 10, explain how this semester's artifact changed a capstone decision, reduced a risk, or made the final system easier to defend.

Model Artifact Calibration

Use the query-plan review model artifact before proposing database indexing or query changes.

Enrichment Pages

Portfolio Artifact | Common Failure Modes | Bridge Review

Goal​

Prerequisites​

Phase Completion Contract​

Modules​

Core Resources​

Cross-Cutting Tracks Active This Semester​

Weekly Arc​

Study Rhythm​

Weekly Learning Journal Prompts​

Semester Deliverables​

Capstone Throughline​

Model Artifact Calibration​

Enrichment Pages​