Skip to main content

Learning Resources

The concept pages are the main path. These resources are selective reinforcement -- use them only when a concept page plus its drill did not stick, or when you want to deepen a specific area before Module 3.

Primary Texts (local chunks)

  • Database Internals by Alex Petrov -- the primary text for this module. See Reference and Selective Reading for exact chunks.
  • Designing Data-Intensive Applications (DDIA) by Martin Kleppmann -- Chapter 3 (Storage and Retrieval) is pivotal; Chapter 7 (Transactions) supplements Cluster 5.
  • Database System Concepts by Silberschatz, Korth, Sudarshan -- classical rigor on indexes, query processing, concurrency, and recovery.
  • Distributed Systems: Concepts and Design by Coulouris et al. -- peripheral for this module; only consult when storage crosses machine boundaries.

High-Signal External Courses

CMU 15-445: Database Systems (Andy Pavlo)

  • Course homepage: https://15445.courses.cs.cmu.edu/
  • Lectures and notes cover storage, indexes, joins, query execution, concurrency, and logging.
  • Recommended lectures: Storage, Tree Indexes, Hash Tables, Query Execution, Joins, Concurrency Control, Multi-Version Concurrency Control, Logging & Recovery.
  • Use when: you want a second explanation of any Cluster 1-5 topic with slides and end-of-lecture problems.

CMU 15-721: Advanced Database Systems

  • Course homepage: https://15721.courses.cs.cmu.edu/
  • Use when: you want implementation-level treatment of vectorized execution, JIT, MVCC implementations, and modern optimizers.

Martin Kleppmann's Distributed Systems Lectures (Cambridge)

High-Signal Documentation

PostgreSQL Docs

RocksDB Docs and Wiki

SQLite and LMDB

Papers Worth Reading

  • Graefe, "Modern B-Tree Techniques." Survey-level treatment.
  • O'Neil et al., "The Log-Structured Merge-Tree (LSM-Tree)." Original LSM paper.
  • Mohan et al., "ARIES: A Transaction Recovery Method Supporting Fine-Granularity Locking and Partial Rollbacks Using Write-Ahead Logging."
  • Selinger et al., "Access Path Selection in a Relational Database Management System." Founding CBO paper.
  • Boncz et al., "Breaking the Memory Wall in MonetDB." Vectorized execution motivation.
  • Neumann, "Efficiently Compiling Efficient Query Plans for Modern Hardware." JIT execution.

When To Use Which

GoalGo to
Re-read a Cluster 1 conceptDatabase Internals chunks on pages/buffer pool
Re-read a Cluster 2 conceptDatabase Internals B-Tree chapter; DDIA ch. 3
Re-read a Cluster 3 conceptDDIA ch. 3 (LSM section); RocksDB wiki
Re-read a Cluster 4 conceptCMU 15-445 Execution/Joins lectures; PostgreSQL EXPLAIN docs
Re-read a Cluster 5 conceptDatabase Internals Concurrency + WAL; ARIES paper
Practice real EXPLAINPostgreSQL docs + a local database
Practice real LSM behaviorRocksDB wiki + db_bench

Skip-This-Unless List

Avoid these while working on Module 2:

  • Distributed transaction protocols (2PC, Paxos, Raft) -- Module 5
  • CAP and consistency-model theory -- Module 4
  • Storage hardware deep dives (NVMe internals, SSD FTL) -- useful once, not for this module
  • Specific vendor-tuning guides -- worth reading only after you can reason about cost independently