Learning Resources
The concept pages are the main path. These resources are selective reinforcement -- use them only when a concept page plus its drill did not stick, or when you want to deepen a specific area before Module 3.
Primary Texts (local chunks)
- Database Internals by Alex Petrov -- the primary text for this module. See Reference and Selective Reading for exact chunks.
- Designing Data-Intensive Applications (DDIA) by Martin Kleppmann -- Chapter 3 (Storage and Retrieval) is pivotal; Chapter 7 (Transactions) supplements Cluster 5.
- Database System Concepts by Silberschatz, Korth, Sudarshan -- classical rigor on indexes, query processing, concurrency, and recovery.
- Distributed Systems: Concepts and Design by Coulouris et al. -- peripheral for this module; only consult when storage crosses machine boundaries.
High-Signal External Courses
CMU 15-445: Database Systems (Andy Pavlo)
- Course homepage: https://15445.courses.cs.cmu.edu/
- Lectures and notes cover storage, indexes, joins, query execution, concurrency, and logging.
- Recommended lectures: Storage, Tree Indexes, Hash Tables, Query Execution, Joins, Concurrency Control, Multi-Version Concurrency Control, Logging & Recovery.
- Use when: you want a second explanation of any Cluster 1-5 topic with slides and end-of-lecture problems.
CMU 15-721: Advanced Database Systems
- Course homepage: https://15721.courses.cs.cmu.edu/
- Use when: you want implementation-level treatment of vectorized execution, JIT, MVCC implementations, and modern optimizers.
Martin Kleppmann's Distributed Systems Lectures (Cambridge)
- Lecture notes: https://www.cl.cam.ac.uk/teaching/2122/ConcDisSys/
- Useful for Module 3 transitions and for isolation-theory framing of Cluster 5.
High-Signal Documentation
PostgreSQL Docs
- Indexes overview: https://www.postgresql.org/library/raw/current/indexes.html
- Index types (B-tree, Hash, GIN, GiST, SP-GiST, BRIN): https://www.postgresql.org/library/raw/current/indexes-types.html
- Index-only scans and covering indexes: https://www.postgresql.org/library/raw/current/indexes-index-only-scans.html
- Using
EXPLAIN: https://www.postgresql.org/library/raw/current/using-explain.html - MVCC: https://www.postgresql.org/library/raw/current/mvcc-intro.html
- WAL: https://www.postgresql.org/library/raw/current/wal-intro.html
RocksDB Docs and Wiki
- RocksDB wiki: https://github.com/facebook/rocksdb/wiki
- Leveled vs Universal Compaction: https://github.com/facebook/rocksdb/wiki/Leveled-Compaction
- RocksDB Overview: https://github.com/facebook/rocksdb/wiki/RocksDB-Overview
- Bloom Filter: https://github.com/facebook/rocksdb/wiki/RocksDB-Bloom-Filter
SQLite and LMDB
- SQLite file format (B-tree page layout in a real engine): https://www.sqlite.org/fileformat.html
- LMDB design: http://www.lmdb.tech/doc/ -- a memory-mapped copy-on-write B+-tree, useful counterpoint to WAL-based engines.
Papers Worth Reading
- Graefe, "Modern B-Tree Techniques." Survey-level treatment.
- O'Neil et al., "The Log-Structured Merge-Tree (LSM-Tree)." Original LSM paper.
- Mohan et al., "ARIES: A Transaction Recovery Method Supporting Fine-Granularity Locking and Partial Rollbacks Using Write-Ahead Logging."
- Selinger et al., "Access Path Selection in a Relational Database Management System." Founding CBO paper.
- Boncz et al., "Breaking the Memory Wall in MonetDB." Vectorized execution motivation.
- Neumann, "Efficiently Compiling Efficient Query Plans for Modern Hardware." JIT execution.
When To Use Which
| Goal | Go to |
|---|---|
| Re-read a Cluster 1 concept | Database Internals chunks on pages/buffer pool |
| Re-read a Cluster 2 concept | Database Internals B-Tree chapter; DDIA ch. 3 |
| Re-read a Cluster 3 concept | DDIA ch. 3 (LSM section); RocksDB wiki |
| Re-read a Cluster 4 concept | CMU 15-445 Execution/Joins lectures; PostgreSQL EXPLAIN docs |
| Re-read a Cluster 5 concept | Database Internals Concurrency + WAL; ARIES paper |
Practice real EXPLAIN | PostgreSQL docs + a local database |
| Practice real LSM behavior | RocksDB wiki + db_bench |
Skip-This-Unless List
Avoid these while working on Module 2:
- Distributed transaction protocols (2PC, Paxos, Raft) -- Module 5
- CAP and consistency-model theory -- Module 4
- Storage hardware deep dives (NVMe internals, SSD FTL) -- useful once, not for this module
- Specific vendor-tuning guides -- worth reading only after you can reason about cost independently