Semester 6: Databases & Distributed Systems
Year 3 -- Data & Architecture | Phase 6 | Weeks 60-69 | 10 weeks
This semester shifts from system components to data-bearing systems. The goal is not to memorize database terms, but to reason about how data is modeled, stored, replicated, protected, and recovered when the real world is messy.
You should leave this semester able to explain why a schema works, why an index helps or hurts, why a transaction fails, and why a distributed data system makes one tradeoff instead of another.
Goal
Build working judgment around relational modeling, query behavior, storage engines, replication, transactions, and distributed failure. By the end of the semester, you should be able to move from application-level data access to infrastructure-level reasoning about durability, correctness, and availability.
Prerequisites
You are ready for this semester if you can already:
- read and write medium-sized programs comfortably
- reason about time and space complexity
- explain processes, memory, files, and concurrency at a systems level
- describe client/server communication, latency, and partial failure at a networking level
- work from code, notes, and diagrams without depending on tutorials for every step
Phase Completion Contract
- Explain: normalization, indexing tradeoffs, replication strategies, isolation levels, and distributed failure models in your own words.
- Build: a data-backed service or analysis project with schema design, query evidence, and explicit tradeoff documentation.
- Diagnose: slow queries, poor data modeling, stale reads, contention, and failure scenarios without hand-waving.
- Document: architecture notes, SQL artifacts, query-plan evidence, and consistency decisions in a form another engineer could review.
- Do not advance if: you still treat databases as black boxes or cannot connect data-model decisions to operational consequences.
Modules
| # | Module | Focus |
|---|---|---|
| 1 | Relational Databases & SQL | Relational modeling, schema quality, joins, aggregates, and query correctness |
| 2 | Storage Engines & Indexing | B-trees, LSM trees, pages, compaction, buffer pools, and workload-aware indexing |
| 3 | Replication & Partitioning | Leaders and followers, lag, sharding, rebalancing, and operational failure modes |
| 4 | Transactions & Consistency | Isolation, anomalies, recovery, consistency guarantees, and correctness under contention |
| 5 | Distributed Systems Fundamentals | Time, coordination, failure detectors, quorum intuition, and tradeoff-oriented distributed reasoning |
Core Resources
| Book | Role |
|---|---|
| Database System Concepts | Primary foundation for the relational model, SQL, transactions, and storage basics |
| Database Internals | Mechanistic depth for pages, storage engines, indexing, and replication internals |
| Designing Data-Intensive Applications | Architecture-level treatment of data systems, consistency, replication, and partitioning |
| Distributed Systems: Concepts and Design | Distributed systems vocabulary, failure models, and coordination foundations |
Use the local semester books/ packs first. Use external references only when a module explicitly needs current implementation docs or a canonical clarification.
Cross-Cutting Tracks Active This Semester
| Track | Level | Focus This Semester |
|---|---|---|
| A: Testing and Verification | 3 | Database tests, transaction-focused edge cases, and evidence that queries and invariants hold under change |
| B: Git, Code Review, and Delivery | 4 | Clear commits for schema evolution, reviewable migrations, and explicit rollback thinking for data changes |
| C: Security Engineering | 3 | Secrets handling, data classification, least privilege, and safe defaults for stored and in-flight data |
| D: Observability and Reliability | 2 | Query timing, error rates, lag, contention, and the signals that tell you when the data plane is unhealthy |
| E: Engineering Fundamentals | 4 | Tradeoff writing, experiment discipline, and mechanical reasoning instead of database folklore |
Weekly Arc
| Week | Focus | Modules |
|---|---|---|
| 1 | Relational model, entities, constraints, and schema quality | M1 |
| 2 | Query formulation, joins, aggregates, and query-reading discipline | M1 |
| 3 | Pages, buffer pools, and why storage layout affects runtime behavior | M2 |
| 4 | Index structures, workload fit, and engine tradeoffs | M2 |
| 5 | Leader-follower replication, lag, failover, and read-path consequences | M3 |
| 6 | Partitioning, hotspots, rebalancing, and scaling pressure | M3 |
| 7 | Transactions, anomalies, locking, MVCC, and recovery intuition | M4 |
| 8 | Consistency models and the cost of stronger guarantees | M4 |
| 9 | Time, coordination, failure detectors, and distributed reality | M5 |
| 10 | Integration, project synthesis, checkpoint, and exam preparation | M5 + project |
Study Rhythm
Each week should include:
- one deep reading session from a primary semester text
- one written note that translates the reading into your own operational language
- one artifact that produces evidence: schema, SQL, benchmark note, failure analysis, or design matrix
- one review block where you revisit prior-semester fundamentals that this semester depends on
This semester benefits from drawing diagrams by hand. Use sequence diagrams, page-layout sketches, transaction timelines, and replication flows instead of relying only on prose.
Weekly Learning Journal Prompts
- Which data modeling or consistency choice this week felt unintuitive at first, and what changed your mind?
- Where did implementation detail matter more than abstraction: storage layout, query planning, replication lag, or transaction behavior?
- What failure mode would hurt users most in your current project, and how would you detect it early?
Semester Deliverables
- All module quizzes completed
- All exercises and implementation labs completed
- Reading notes written for each module
- Query plans, schema decisions, or tradeoff notes captured in the repo
- Semester project completed
- Checkpoint gate passed
- Cumulative review completed
- Semester exam completed
Capstone Throughline
Every semester must leave behind evidence that can survive into the final capstone defense.
- Artifact carried forward: data-backed app and query-plan reviews.
- What to preserve: Preserve schema decisions, data-access code, query plans, indexing experiments, and the reasoning used to validate persistence tradeoffs.
- Module threads: Module 1: Relational Databases & SQL, Module 2: Storage Engines & Indexing, Module 3: Replication & Partitioning, Module 4: Transactions & Consistency, and Module 5: Distributed Systems Fundamentals.
- Defense prompt: In Semester 10, explain how this semester's artifact changed a capstone decision, reduced a risk, or made the final system easier to defend.
Model Artifact Calibration
Use the query-plan review model artifact before proposing database indexing or query changes.