Build Your Own Database (Key-Value / Redis-like)

"A database is a careful arrangement of files." -- Edmund Burke (paraphrased)

A key-value store is the perfect first database project. The data model is trivial (put(k, v), get(k), delete(k)), but the implementation decisions -- log-structured vs B-tree, write-ahead logging, compaction, indexing -- are the actual database engineering. Build one and you'll never read a Postgres or Redis blog post the same way again.

1. Overview & motivation

Two dominant storage engine designs:

Log-Structured Merge tree (LSM) -- append-only writes, periodic compaction. Used by LevelDB, RocksDB, Cassandra, Scylla.
B+ tree -- in-place updates, balanced tree pages. Used by SQLite, Postgres, MySQL (InnoDB), BoltDB.

A simpler variant in tutorial form is the Bitcask model: append-only log, in-memory hash index. Used by Riak. Most readable starting point.

What you can only learn by building one:

Why writes are easy (append) but reads through an LSM are surprisingly complex (search active memtable + N SSTables in order).
Why compaction is the heart of any log-structured engine -- and the source of its operational complexity.
Why B+ trees make range scans free but random writes expensive.
Why write-ahead logging (WAL) is the answer to "what if we crash mid-write."
Why caching (LRU, ARC) appears in every database.

2. Where this fits in the degree

Phase: Architecture
Semester: 6 (Databases and Distributed Systems)
Modules deepened: Module 2 (storage engines & indexing) -- the project for this module. Module 4 (transactions & consistency) if you add WAL + atomicity.

Cross-phase relevance:

Builds on the Memory Allocator tutorial -- allocators matter for buffer pools.
Sets up the Database (relational) tutorial -- same storage, more on top.
Foundation for the Search Engine tutorial -- inverted indexes use the same on-disk structures.

3. Prerequisites

A systems language: Go, Rust, C++, or C.
File I/O comfort.
Hash tables, sorted maps, B-trees as algorithms (you don't need to have implemented them).
Tolerance for thinking about crash scenarios: "what if we lose power right now?"

4. Theory & research

Required reading

Martin Kleppmann, Designing Data-Intensive Applications, Chapter 3 -- "Storage and Retrieval." The single best 50 pages on storage engines. â start here.
Pavel Emelyanov & Edward Z. Yang, "Let's Build a Simple Database" (cstack.github.io/db_tutorial/) -- sqlite clone in C, walks through file format, B-tree, cursors.

Strongly recommended

Alex Petrov, Database Internals -- Part 1 (Storage Engines). Newer than DDIA, more storage-focused.
James Smith, "Build Your Own Database from Scratch: From B+Tree to SQL in 3000 Lines" -- build-your-own.org/database/. Go. Complete B+ tree + SQL layer.

For Redis specifically

Redis source code -- surprisingly readable. Start at src/dict.c (hash table), src/t_string.c (string commands), src/server.c (event loop).
antirez (Salvatore Sanfilippo), "Redis design and implementation" -- multiple blog posts. The architect's notes.

For LSM trees

The original LSM paper: O'Neil, Cheng, Gawlick, O'Neil (1996), "The Log-Structured Merge-Tree" -- short. Read once for context.
LevelDB source code -- about 30K lines of C++. Mature implementation of an LSM with one-thread compaction.
Mark Callaghan's blog -- comparative LSM engineering writing.

5. Curated tutorial list (from BYO-X)

The BYO-X catalog "Database" entries cover both KV and relational. Here are the KV-flavored ones:

C++: Build Your Own Redis from Scratch -- build-your-own.org/redis -- comprehensive book/site, Go â recommended primary for Redis-like
C#: Build Your Own Database
Clojure: An Archaeology-Inspired Database
Crystal: Why you should build your own NoSQL Database
Go: Build Your Own Database from Scratch: From B+Tree To SQL in 3000 Lines -- build-your-own.org/database â recommended primary for B+ tree path
Go: Code a database in 45 steps: a series of test-driven small coding puzzles
Go: Build Your Own Redis from Scratch
JavaScript: Dagoba: an in-memory graph database -- different model (graph)
Python: DBDB: Dog Bed Database -- from the 500 Lines or Less book â classic
Python: Write your own miniature Redis with Python
Ruby: Build your own fast, persistent KV store in Ruby
Rust: Build your own Redis client and server

6. Recommended primary path

Three excellent paths; pick by interest:

Path A: Redis-like (network-server-shaped)

James Smith, "Build Your Own Redis" (build-your-own.org/redis). Free book. Builds a Redis-compatible server: TCP, event loop, command parsing, hash tables, sorted sets, AOF persistence, replication. ~2,000 lines of Go.

Choose this if you also want to learn server architecture (event loops, non-blocking I/O).

Path B: B+ tree storage engine

James Smith, "Build Your Own Database from Scratch" (build-your-own.org/database). Same author, focused on disk-resident B+ tree + SQL layer. 3,000 lines of Go.

Choose this if you want the storage engine internals.

Path C: Bitcask-style (LSM-precursor)

Read the Bitcask paper (Riak) -- 6 pages. Then implement it. Append-only log + in-memory hash index + compaction.

Smallest scope. Easy to finish in a week. Excellent for understanding why production engines became more complex.

For this degree: Path C -> Path B if you want depth; Path A if you want the Redis shape.

7. Implementation milestones (Bitcask-style, Path C)

Milestone 1: Append-only log with in-memory index

Writes append a record to a log file. The in-memory index maps key -> (file_offset, length). Reads seek + read.

class DB:
    def __init__(self, path):
        self.path = path
        self.index = {}  # key -> (offset, length)
        self.fd = open(path, 'a+b')
        self._rebuild_index()

    def put(self, key, value):
        offset = self.fd.tell()
        record = encode(key, value)
        self.fd.write(record)
        self.index[key] = (offset, len(record))

    def get(self, key):
        if key not in self.index: return None
        offset, length = self.index[key]
        self.fd.seek(offset)
        record = self.fd.read(length)
        return decode(record).value

    def delete(self, key):
        if key in self.index:
            del self.index[key]
            self.put(key, TOMBSTONE)

Evidence: 100k puts and gets work. After restart, get still returns the right values (index rebuilt on startup).

Milestone 2: Crash safety with checksums

Each record has a CRC32. On reading, verify the CRC. If a partial record is at the tail (we crashed mid-write), truncate it.

record = checksum + key_size + value_size + key + value

Evidence: Kill the process during a write, restart, verify integrity. No corruption.

Milestone 3: Multiple log files + compaction

When the active log exceeds a threshold (say 1 MB), close it, open a new one. Run a background compaction: for each closed log, rewrite a smaller file containing only the latest version of each key (drop tombstoned/overwritten keys).

log-001 (closed, 1 MB)   â”€â”
log-002 (closed, 1 MB)   â”€â”¼â”€-> compact -> log-merged (200 KB)
log-003 (closed, 1 MB)   â”€â”˜
log-004 (active, 256 KB)

Evidence: After heavy update workload, on-disk size shrinks after compaction.

Milestone 4: Persistent index hint file

Rebuilding the index from logs on startup is O(records). For large DBs, this is slow. Periodically write a "hint" file: a snapshot of the in-memory index. Startup: load hint + replay any newer log entries.

Evidence: Startup time stays constant even as DB grows.

Milestone 5: Concurrent access

Single writer (mutex), many readers. Background compactor as a separate goroutine/thread.

Evidence: Multi-reader stress test: 8 reader threads, 1 writer thread, no data races (run under TSan).

Milestone 6 (optional): LSM tree progression

Once Bitcask is solid, the leap to LSM is:

MemTable (in-memory sorted map for recent writes) replaces the simple hash.
When MemTable is full, flush as an immutable SSTable on disk.
SSTables are sorted, supporting range scans.
Compaction merges N SSTables into 1 larger one.
Reads check MemTable, then each SSTable from newest to oldest.

LevelDB and RocksDB do exactly this with multi-level storage.

Evidence: Range scan for k in db.range("user:", "user:~"): works correctly across multiple SSTables.

Milestone 7 (optional): Network server (Redis path)

Add TCP server with RESP (Redis serialization protocol) parsing. Now redis-cli connects to your server.

RESP commands:
*3\r\n$3\r\nSET\r\n$3\r\nfoo\r\n$3\r\nbar\r\n   â† SET foo bar
+OK\r\n                                       â† response

Evidence: redis-cli -p YOUR_PORT SET foo bar; GET foo returns bar.

8. Tests & evidence

Test	How
Basic put/get/delete	Trivially. Edge cases: empty value, large value, deleted key.
Persistence	Write, close DB, reopen, read.
Crash recovery	Kill mid-write; restart; verify no corruption.
Compaction correctness	Pre-compact and post-compact reads return identical values.
Range scan (LSM)	Multi-SSTable range scan returns in sorted order.
Performance	Random write throughput; random read throughput; range scan throughput. Plot vs DB size.
Stress	1M writes, then 10M reads at random keys, with crashes injected.

The strongest evidence: a benchmark plot comparing your engine to LevelDB or BoltDB on the same workload. You'll be slower; that's fine. Document by how much and why.

9. Common pitfalls

fsync confusion. write() returns when the OS buffer has the data, not when it's on disk. Without fsync on commit, a power loss loses recent writes. Decide: durability per commit (slow but safe) or batched (fast but lossy).
Index integrity. If your in-memory index gets out of sync with the log, reads return wrong values. Always update the log first, then the index.
Tombstones in compaction. A tombstone means "this key was deleted." After compaction, tombstones must be retained until all older versions have also been compacted away.
Reading from a being-written file. Two file handles (one for writing, one for reading) is cleanest.
No CRC. Without checksums, a corrupted byte silently returns wrong data forever.
Single-file design with many keys. Performance plateaus once the file grows huge. Real engines split into many files.
Treating compaction as optional. Without it, on-disk size grows without bound.
Locking the world during compaction. Compaction must run concurrently with reads and writes. This is the operational tension that defines every LSM engine.

10. Extensions

Snapshots. Point-in-time consistent reads, useful for backups.
TTL (expiration). Records with an expiration timestamp.
Compression. Snappy, LZ4, or zstd on SSTables.
Bloom filters. For each SSTable, precompute a bloom filter so most "key not present" reads skip the file.
Range scans + bounded iterators. Standard for any LSM.
Transactions. Atomic batches of writes (write group commit).
Replication. A simple primary-replica setup. Leads naturally to the Kafka-like tutorial.

11. Module integration

Module	What the KV store deepens
Sem 6 Module 2 -- Storage engines & indexing	The whole project.
Sem 6 Module 4 -- Transactions & consistency	Adding WAL + atomic batches turns this into a transactional store.
Memory Allocator tutorial	Buffer pools and slab allocation matter for performance.
Search Engine tutorial	Inverted indexes use SSTable-like sorted on-disk structures.
Database (relational) tutorial	Same storage engine; add SQL parsing and a query planner on top.
Sem 6 Module 3 -- Replication	After single-node works, add a follower. Step toward Kafka-like.

12. Portfolio framing

What to publish:

Source organized as engine/, wal/, compaction/, server/ (if Redis path).
A benchmark report comparing your engine to LevelDB/BoltDB/Redis on the same workload.
A README with:
- Design decisions (LSM vs B-tree vs Bitcask).
- The crash-recovery test transcript.
- Limitations (no replication, single writer, etc.).
A docs/architecture.md with on-disk format diagrams.

Reviewer entry points:

engine/put.go -- the write path.
engine/get.go -- the read path (more complex than you'd think after compaction).
compaction/merge.go -- the operationally-critical code.
bench/results.md -- the comparison.

A working storage engine is one of the strongest portfolio pieces in a systems/architecture engineer's collection. "I implemented an LSM-like KV store that handles 100k writes/sec on this benchmark" is concrete and reads well.

13. Local source backbone

Use these local chunks to turn the KV/Redis path into a full storage-and-server progression:

Build Your Own Database From Scratch (build-your-own/database-james-smith)
Build Your Own Database From Scratch in Go, 2E (build-your-own/database-go-2e-james-smith)
Build Your Own Redis with C/C++ (build-your-own/redis-james-smith)

Local chunks	Use them for	Add to this project
Database `001`-`002`	Files versus databases, atomic rename, durable files, LSM overview	Add a crash-safety lab before any tree work.
Database `003`-`005`	B-tree/B+ tree intuition, node encoding, insertion, deletion	Add page-layout diagrams and invariant tests.
Database `006`-`008`	Persisting pages, master page, allocation, free list	Add a page allocator and free-list inspection tool.
Database `009`-`011`	Point queries, range queries, secondary indexes	Add range-scan and secondary-index milestones.
Database `012`-`014`	Transactions, concurrent transactions, SQL parser bridge	Treat as the handoff into the relational database tutorial.
Redis `001`-`010`	TCP byte stream, sockets, non-blocking I/O, poll/event loop, batching	Add a Redis-compatible network front end after the storage engine works.
Redis `011`-`020`	Command processing, response serialization, hash tables, resizing, binary trees, sorted sets	Add RESP command tests and in-memory data-structure benchmarks.
Redis `021`-`027`	Non-blocking I/O, timers, TTL commands, thread pool/semaphore work	Add TTL expiry, background jobs, and event-loop latency measurements.

Extra checkpoints from the book chunks

Durability checkpoint: kill the process during append, page write, and compaction; document recovery behavior.
Page checkpoint: print one on-disk page as decoded fields, not just bytes.
Protocol checkpoint: support a tiny RESP subset and verify it with a real Redis client where practical.
Latency checkpoint: measure p50/p95 command latency before and after TTL/background work.

14. Deep project spec

Project contract

Build a key-value storage engine with a documented persistence model. The minimum contract is put, get, delete, range scan if using ordered storage, crash recovery, and a benchmarkable workload. The project must state whether it is append-log, LSM-like, B+ tree/page-based, or in-memory with persistence.

Source-backed reading map

Source ID	Use for	Required output
`build-your-own/database-james-smith`	page layout, B+ tree, persistence, free list, indexes	storage format, decoded page dump, recovery tests
`build-your-own/database-go-2e-james-smith`	Go implementation path and SQL bridge	implementation checkpoints and range/index tests
`build-your-own/redis-james-smith`	in-memory comparison and server extension	optional network front end and latency report

Milestone map

Milestone	Deliverable	Tests	Failure case
Storage contract	data model and file format	spec checklist	unsupported key/value size
Write path	durable `put`/`delete`	append/page write fixtures	torn or partial write
Read path	`get` and tombstone handling	model comparison	deleted key reappears
Index structure	B+ tree, hash index, or LSM memtable/SSTable	invariant tests	split/merge corruption
Range scan	ordered iteration if supported	sorted output fixtures	scan crosses page/table boundary
Recovery	restart and repair behavior	kill/restart tests	corrupted tail/page detected
Compaction/free list	reclaim space	before/after size and correctness	live key lost during compaction

Test matrix

Test type	Required examples
Unit	encoding, page/node invariants, comparator behavior
Model	randomized operations compared to an in-memory map
Crash	kill during write, rename, compaction, or page flush
Golden	decoded on-disk page/log fixture
Benchmark	read-heavy, write-heavy, mixed workload with p50/p95

Design notes required

format.md: file/page/log layout and versioning.
recovery.md: what is durable, what may be lost, and how corruption is handled.
index-choice.md: why this structure fits the target workload.
benchmark.md: workload, data size, machine, and interpretation of results.

Portfolio evidence

Publish the storage contract, decoded page/log examples, crash-recovery transcript, randomized model-test summary, and one benchmark table with limitations.

Source

This tutorial draws from the BYO-X catalog "Database" section. DDIA Chapter 3 and James Smith's "Build Your Own Redis" / "Build Your Own Database" are the canonical primary sources.

1. Overview & motivation​

2. Where this fits in the degree​

3. Prerequisites​

4. Theory & research​

Required reading​

Strongly recommended​

For Redis specifically​

For LSM trees​

5. Curated tutorial list (from BYO-X)​

6. Recommended primary path​

Path A: Redis-like (network-server-shaped)​

Path B: B+ tree storage engine​

Path C: Bitcask-style (LSM-precursor)​

7. Implementation milestones (Bitcask-style, Path C)​

Milestone 1: Append-only log with in-memory index​

Milestone 2: Crash safety with checksums​

Milestone 3: Multiple log files + compaction​

Milestone 4: Persistent index hint file​

Milestone 5: Concurrent access​

Milestone 6 (optional): LSM tree progression​

Milestone 7 (optional): Network server (Redis path)​

8. Tests & evidence​

9. Common pitfalls​

10. Extensions​

11. Module integration​

12. Portfolio framing​

13. Local source backbone​

Extra checkpoints from the book chunks​

14. Deep project spec​

Project contract​

Source-backed reading map​

Milestone map​

Test matrix​

Design notes required​

Portfolio evidence​

Source​