Skip to main content

Semester 6: Databases & Distributed Systems

Year 3 -- Data & Architecture | Phase 6 | Weeks 60-69 | 10 weeks

This semester shifts from system components to data-bearing systems. The goal is not to memorize database terms, but to reason about how data is modeled, stored, replicated, protected, and recovered when the real world is messy.

You should leave this semester able to explain why a schema works, why an index helps or hurts, why a transaction fails, and why a distributed data system makes one tradeoff instead of another.


Goal

Build working judgment around relational modeling, query behavior, storage engines, replication, transactions, and distributed failure. By the end of the semester, you should be able to move from application-level data access to infrastructure-level reasoning about durability, correctness, and availability.

Prerequisites

You are ready for this semester if you can already:

  • read and write medium-sized programs comfortably
  • reason about time and space complexity
  • explain processes, memory, files, and concurrency at a systems level
  • describe client/server communication, latency, and partial failure at a networking level
  • work from code, notes, and diagrams without depending on tutorials for every step

Phase Completion Contract

  • Explain: normalization, indexing tradeoffs, replication strategies, isolation levels, and distributed failure models in your own words.
  • Build: a data-backed service or analysis project with schema design, query evidence, and explicit tradeoff documentation.
  • Diagnose: slow queries, poor data modeling, stale reads, contention, and failure scenarios without hand-waving.
  • Document: architecture notes, SQL artifacts, query-plan evidence, and consistency decisions in a form another engineer could review.
  • Do not advance if: you still treat databases as black boxes or cannot connect data-model decisions to operational consequences.

Modules

#ModuleFocus
1Relational Databases & SQLRelational modeling, schema quality, joins, aggregates, and query correctness
2Storage Engines & IndexingB-trees, LSM trees, pages, compaction, buffer pools, and workload-aware indexing
3Replication & PartitioningLeaders and followers, lag, sharding, rebalancing, and operational failure modes
4Transactions & ConsistencyIsolation, anomalies, recovery, consistency guarantees, and correctness under contention
5Distributed Systems FundamentalsTime, coordination, failure detectors, quorum intuition, and tradeoff-oriented distributed reasoning

Core Resources

BookRole
Database System ConceptsPrimary foundation for the relational model, SQL, transactions, and storage basics
Database InternalsMechanistic depth for pages, storage engines, indexing, and replication internals
Designing Data-Intensive ApplicationsArchitecture-level treatment of data systems, consistency, replication, and partitioning
Distributed Systems: Concepts and DesignDistributed systems vocabulary, failure models, and coordination foundations

Use the local semester books/ packs first. Use external references only when a module explicitly needs current implementation docs or a canonical clarification.

Cross-Cutting Tracks Active This Semester

TrackLevelFocus This Semester
A: Testing and Verification3Database tests, transaction-focused edge cases, and evidence that queries and invariants hold under change
B: Git, Code Review, and Delivery4Clear commits for schema evolution, reviewable migrations, and explicit rollback thinking for data changes
C: Security Engineering3Secrets handling, data classification, least privilege, and safe defaults for stored and in-flight data
D: Observability and Reliability2Query timing, error rates, lag, contention, and the signals that tell you when the data plane is unhealthy
E: Engineering Fundamentals4Tradeoff writing, experiment discipline, and mechanical reasoning instead of database folklore

Weekly Arc

WeekFocusModules
1Relational model, entities, constraints, and schema qualityM1
2Query formulation, joins, aggregates, and query-reading disciplineM1
3Pages, buffer pools, and why storage layout affects runtime behaviorM2
4Index structures, workload fit, and engine tradeoffsM2
5Leader-follower replication, lag, failover, and read-path consequencesM3
6Partitioning, hotspots, rebalancing, and scaling pressureM3
7Transactions, anomalies, locking, MVCC, and recovery intuitionM4
8Consistency models and the cost of stronger guaranteesM4
9Time, coordination, failure detectors, and distributed realityM5
10Integration, project synthesis, checkpoint, and exam preparationM5 + project

Study Rhythm

Each week should include:

  • one deep reading session from a primary semester text
  • one written note that translates the reading into your own operational language
  • one artifact that produces evidence: schema, SQL, benchmark note, failure analysis, or design matrix
  • one review block where you revisit prior-semester fundamentals that this semester depends on

This semester benefits from drawing diagrams by hand. Use sequence diagrams, page-layout sketches, transaction timelines, and replication flows instead of relying only on prose.

Weekly Learning Journal Prompts

  1. Which data modeling or consistency choice this week felt unintuitive at first, and what changed your mind?
  2. Where did implementation detail matter more than abstraction: storage layout, query planning, replication lag, or transaction behavior?
  3. What failure mode would hurt users most in your current project, and how would you detect it early?

Semester Deliverables


Capstone Throughline

Every semester must leave behind evidence that can survive into the final capstone defense.


Model Artifact Calibration

Use the query-plan review model artifact before proposing database indexing or query changes.


Enrichment Pages

Portfolio Artifact | Common Failure Modes | Bridge Review