Skip to main content

Build Your Own Database (Key-Value / Redis-like)

"A database is a careful arrangement of files." -- Edmund Burke (paraphrased)

A key-value store is the perfect first database project. The data model is trivial (put(k, v), get(k), delete(k)), but the implementation decisions -- log-structured vs B-tree, write-ahead logging, compaction, indexing -- are the actual database engineering. Build one and you'll never read a Postgres or Redis blog post the same way again.


1. Overview & motivation

Two dominant storage engine designs:

  1. Log-Structured Merge tree (LSM) -- append-only writes, periodic compaction. Used by LevelDB, RocksDB, Cassandra, Scylla.
  2. B+ tree -- in-place updates, balanced tree pages. Used by SQLite, Postgres, MySQL (InnoDB), BoltDB.

A simpler variant in tutorial form is the Bitcask model: append-only log, in-memory hash index. Used by Riak. Most readable starting point.

What you can only learn by building one:

  • Why writes are easy (append) but reads through an LSM are surprisingly complex (search active memtable + N SSTables in order).
  • Why compaction is the heart of any log-structured engine -- and the source of its operational complexity.
  • Why B+ trees make range scans free but random writes expensive.
  • Why write-ahead logging (WAL) is the answer to "what if we crash mid-write."
  • Why caching (LRU, ARC) appears in every database.

2. Where this fits in the degree

  • Phase: Architecture
  • Semester: 6 (Databases and Distributed Systems)
  • Modules deepened: Module 2 (storage engines & indexing) -- the project for this module. Module 4 (transactions & consistency) if you add WAL + atomicity.

Cross-phase relevance:


3. Prerequisites

  • A systems language: Go, Rust, C++, or C.
  • File I/O comfort.
  • Hash tables, sorted maps, B-trees as algorithms (you don't need to have implemented them).
  • Tolerance for thinking about crash scenarios: "what if we lose power right now?"

4. Theory & research

Required reading

  • Martin Kleppmann, Designing Data-Intensive Applications, Chapter 3 -- "Storage and Retrieval." The single best 50 pages on storage engines. ⭐ start here.
  • Pavel Emelyanov & Edward Z. Yang, "Let's Build a Simple Database" (cstack.github.io/db_tutorial/) -- sqlite clone in C, walks through file format, B-tree, cursors.
  • Alex Petrov, Database Internals -- Part 1 (Storage Engines). Newer than DDIA, more storage-focused.
  • James Smith, "Build Your Own Database from Scratch: From B+Tree to SQL in 3000 Lines" -- build-your-own.org/database/. Go. Complete B+ tree + SQL layer.

For Redis specifically

  • Redis source code -- surprisingly readable. Start at src/dict.c (hash table), src/t_string.c (string commands), src/server.c (event loop).
  • antirez (Salvatore Sanfilippo), "Redis design and implementation" -- multiple blog posts. The architect's notes.

For LSM trees

  • The original LSM paper: O'Neil, Cheng, Gawlick, O'Neil (1996), "The Log-Structured Merge-Tree" -- short. Read once for context.
  • LevelDB source code -- about 30K lines of C++. Mature implementation of an LSM with one-thread compaction.
  • Mark Callaghan's blog -- comparative LSM engineering writing.

5. Curated tutorial list (from BYO-X)

The BYO-X catalog "Database" entries cover both KV and relational. Here are the KV-flavored ones:

  • C++: Build Your Own Redis from Scratch -- build-your-own.org/redis -- comprehensive book/site, Go ⭐ recommended primary for Redis-like
  • C#: Build Your Own Database
  • Clojure: An Archaeology-Inspired Database
  • Crystal: Why you should build your own NoSQL Database
  • Go: Build Your Own Database from Scratch: From B+Tree To SQL in 3000 Lines -- build-your-own.org/database ⭐ recommended primary for B+ tree path
  • Go: Code a database in 45 steps: a series of test-driven small coding puzzles
  • Go: Build Your Own Redis from Scratch
  • JavaScript: Dagoba: an in-memory graph database -- different model (graph)
  • Python: DBDB: Dog Bed Database -- from the 500 Lines or Less book ⭐ classic
  • Python: Write your own miniature Redis with Python
  • Ruby: Build your own fast, persistent KV store in Ruby
  • Rust: Build your own Redis client and server

Three excellent paths; pick by interest:

Path A: Redis-like (network-server-shaped)

James Smith, "Build Your Own Redis" (build-your-own.org/redis). Free book. Builds a Redis-compatible server: TCP, event loop, command parsing, hash tables, sorted sets, AOF persistence, replication. ~2,000 lines of Go.

Choose this if you also want to learn server architecture (event loops, non-blocking I/O).

Path B: B+ tree storage engine

James Smith, "Build Your Own Database from Scratch" (build-your-own.org/database). Same author, focused on disk-resident B+ tree + SQL layer. 3,000 lines of Go.

Choose this if you want the storage engine internals.

Path C: Bitcask-style (LSM-precursor)

Read the Bitcask paper (Riak) -- 6 pages. Then implement it. Append-only log + in-memory hash index + compaction.

Smallest scope. Easy to finish in a week. Excellent for understanding why production engines became more complex.

For this degree: Path C -> Path B if you want depth; Path A if you want the Redis shape.


7. Implementation milestones (Bitcask-style, Path C)

Milestone 1: Append-only log with in-memory index

Writes append a record to a log file. The in-memory index maps key -> (file_offset, length). Reads seek + read.

class DB:
def __init__(self, path):
self.path = path
self.index = {} # key -> (offset, length)
self.fd = open(path, 'a+b')
self._rebuild_index()

def put(self, key, value):
offset = self.fd.tell()
record = encode(key, value)
self.fd.write(record)
self.index[key] = (offset, len(record))

def get(self, key):
if key not in self.index: return None
offset, length = self.index[key]
self.fd.seek(offset)
record = self.fd.read(length)
return decode(record).value

def delete(self, key):
if key in self.index:
del self.index[key]
self.put(key, TOMBSTONE)

Evidence: 100k puts and gets work. After restart, get still returns the right values (index rebuilt on startup).

Milestone 2: Crash safety with checksums

Each record has a CRC32. On reading, verify the CRC. If a partial record is at the tail (we crashed mid-write), truncate it.

record = checksum + key_size + value_size + key + value

Evidence: Kill the process during a write, restart, verify integrity. No corruption.

Milestone 3: Multiple log files + compaction

When the active log exceeds a threshold (say 1 MB), close it, open a new one. Run a background compaction: for each closed log, rewrite a smaller file containing only the latest version of each key (drop tombstoned/overwritten keys).

log-001 (closed, 1 MB)   ─┐
log-002 (closed, 1 MB) ─┼─-> compact -> log-merged (200 KB)
log-003 (closed, 1 MB) ─┘
log-004 (active, 256 KB)

Evidence: After heavy update workload, on-disk size shrinks after compaction.

Milestone 4: Persistent index hint file

Rebuilding the index from logs on startup is O(records). For large DBs, this is slow. Periodically write a "hint" file: a snapshot of the in-memory index. Startup: load hint + replay any newer log entries.

Evidence: Startup time stays constant even as DB grows.

Milestone 5: Concurrent access

Single writer (mutex), many readers. Background compactor as a separate goroutine/thread.

Evidence: Multi-reader stress test: 8 reader threads, 1 writer thread, no data races (run under TSan).

Milestone 6 (optional): LSM tree progression

Once Bitcask is solid, the leap to LSM is:

  • MemTable (in-memory sorted map for recent writes) replaces the simple hash.
  • When MemTable is full, flush as an immutable SSTable on disk.
  • SSTables are sorted, supporting range scans.
  • Compaction merges N SSTables into 1 larger one.
  • Reads check MemTable, then each SSTable from newest to oldest.

LevelDB and RocksDB do exactly this with multi-level storage.

Evidence: Range scan for k in db.range("user:", "user:~"): works correctly across multiple SSTables.

Milestone 7 (optional): Network server (Redis path)

Add TCP server with RESP (Redis serialization protocol) parsing. Now redis-cli connects to your server.

RESP commands:
*3\r\n$3\r\nSET\r\n$3\r\nfoo\r\n$3\r\nbar\r\n ← SET foo bar
+OK\r\n ← response

Evidence: redis-cli -p YOUR_PORT SET foo bar; GET foo returns bar.


8. Tests & evidence

TestHow
Basic put/get/deleteTrivially. Edge cases: empty value, large value, deleted key.
PersistenceWrite, close DB, reopen, read.
Crash recoveryKill mid-write; restart; verify no corruption.
Compaction correctnessPre-compact and post-compact reads return identical values.
Range scan (LSM)Multi-SSTable range scan returns in sorted order.
PerformanceRandom write throughput; random read throughput; range scan throughput. Plot vs DB size.
Stress1M writes, then 10M reads at random keys, with crashes injected.

The strongest evidence: a benchmark plot comparing your engine to LevelDB or BoltDB on the same workload. You'll be slower; that's fine. Document by how much and why.


9. Common pitfalls

  • fsync confusion. write() returns when the OS buffer has the data, not when it's on disk. Without fsync on commit, a power loss loses recent writes. Decide: durability per commit (slow but safe) or batched (fast but lossy).
  • Index integrity. If your in-memory index gets out of sync with the log, reads return wrong values. Always update the log first, then the index.
  • Tombstones in compaction. A tombstone means "this key was deleted." After compaction, tombstones must be retained until all older versions have also been compacted away.
  • Reading from a being-written file. Two file handles (one for writing, one for reading) is cleanest.
  • No CRC. Without checksums, a corrupted byte silently returns wrong data forever.
  • Single-file design with many keys. Performance plateaus once the file grows huge. Real engines split into many files.
  • Treating compaction as optional. Without it, on-disk size grows without bound.
  • Locking the world during compaction. Compaction must run concurrently with reads and writes. This is the operational tension that defines every LSM engine.

10. Extensions

  • Snapshots. Point-in-time consistent reads, useful for backups.
  • TTL (expiration). Records with an expiration timestamp.
  • Compression. Snappy, LZ4, or zstd on SSTables.
  • Bloom filters. For each SSTable, precompute a bloom filter so most "key not present" reads skip the file.
  • Range scans + bounded iterators. Standard for any LSM.
  • Transactions. Atomic batches of writes (write group commit).
  • Replication. A simple primary-replica setup. Leads naturally to the Kafka-like tutorial.

11. Module integration

ModuleWhat the KV store deepens
Sem 6 Module 2 -- Storage engines & indexingThe whole project.
Sem 6 Module 4 -- Transactions & consistencyAdding WAL + atomic batches turns this into a transactional store.
Memory Allocator tutorialBuffer pools and slab allocation matter for performance.
Search Engine tutorialInverted indexes use SSTable-like sorted on-disk structures.
Database (relational) tutorialSame storage engine; add SQL parsing and a query planner on top.
Sem 6 Module 3 -- ReplicationAfter single-node works, add a follower. Step toward Kafka-like.

12. Portfolio framing

What to publish:

  • Source organized as engine/, wal/, compaction/, server/ (if Redis path).
  • A benchmark report comparing your engine to LevelDB/BoltDB/Redis on the same workload.
  • A README with:
    • Design decisions (LSM vs B-tree vs Bitcask).
    • The crash-recovery test transcript.
    • Limitations (no replication, single writer, etc.).
  • A docs/architecture.md with on-disk format diagrams.

Reviewer entry points:

  • engine/put.go -- the write path.
  • engine/get.go -- the read path (more complex than you'd think after compaction).
  • compaction/merge.go -- the operationally-critical code.
  • bench/results.md -- the comparison.

A working storage engine is one of the strongest portfolio pieces in a systems/architecture engineer's collection. "I implemented an LSM-like KV store that handles 100k writes/sec on this benchmark" is concrete and reads well.


13. Local source backbone

Use these local chunks to turn the KV/Redis path into a full storage-and-server progression:

  • Build Your Own Database From Scratch (build-your-own/database-james-smith)
  • Build Your Own Database From Scratch in Go, 2E (build-your-own/database-go-2e-james-smith)
  • Build Your Own Redis with C/C++ (build-your-own/redis-james-smith)
Local chunksUse them forAdd to this project
Database 001-002Files versus databases, atomic rename, durable files, LSM overviewAdd a crash-safety lab before any tree work.
Database 003-005B-tree/B+ tree intuition, node encoding, insertion, deletionAdd page-layout diagrams and invariant tests.
Database 006-008Persisting pages, master page, allocation, free listAdd a page allocator and free-list inspection tool.
Database 009-011Point queries, range queries, secondary indexesAdd range-scan and secondary-index milestones.
Database 012-014Transactions, concurrent transactions, SQL parser bridgeTreat as the handoff into the relational database tutorial.
Redis 001-010TCP byte stream, sockets, non-blocking I/O, poll/event loop, batchingAdd a Redis-compatible network front end after the storage engine works.
Redis 011-020Command processing, response serialization, hash tables, resizing, binary trees, sorted setsAdd RESP command tests and in-memory data-structure benchmarks.
Redis 021-027Non-blocking I/O, timers, TTL commands, thread pool/semaphore workAdd TTL expiry, background jobs, and event-loop latency measurements.

Extra checkpoints from the book chunks

  1. Durability checkpoint: kill the process during append, page write, and compaction; document recovery behavior.
  2. Page checkpoint: print one on-disk page as decoded fields, not just bytes.
  3. Protocol checkpoint: support a tiny RESP subset and verify it with a real Redis client where practical.
  4. Latency checkpoint: measure p50/p95 command latency before and after TTL/background work.

14. Deep project spec

Project contract

Build a key-value storage engine with a documented persistence model. The minimum contract is put, get, delete, range scan if using ordered storage, crash recovery, and a benchmarkable workload. The project must state whether it is append-log, LSM-like, B+ tree/page-based, or in-memory with persistence.

Source-backed reading map

Source IDUse forRequired output
build-your-own/database-james-smithpage layout, B+ tree, persistence, free list, indexesstorage format, decoded page dump, recovery tests
build-your-own/database-go-2e-james-smithGo implementation path and SQL bridgeimplementation checkpoints and range/index tests
build-your-own/redis-james-smithin-memory comparison and server extensionoptional network front end and latency report

Milestone map

MilestoneDeliverableTestsFailure case
Storage contractdata model and file formatspec checklistunsupported key/value size
Write pathdurable put/deleteappend/page write fixturestorn or partial write
Read pathget and tombstone handlingmodel comparisondeleted key reappears
Index structureB+ tree, hash index, or LSM memtable/SSTableinvariant testssplit/merge corruption
Range scanordered iteration if supportedsorted output fixturesscan crosses page/table boundary
Recoveryrestart and repair behaviorkill/restart testscorrupted tail/page detected
Compaction/free listreclaim spacebefore/after size and correctnesslive key lost during compaction

Test matrix

Test typeRequired examples
Unitencoding, page/node invariants, comparator behavior
Modelrandomized operations compared to an in-memory map
Crashkill during write, rename, compaction, or page flush
Goldendecoded on-disk page/log fixture
Benchmarkread-heavy, write-heavy, mixed workload with p50/p95

Design notes required

  • format.md: file/page/log layout and versioning.
  • recovery.md: what is durable, what may be lost, and how corruption is handled.
  • index-choice.md: why this structure fits the target workload.
  • benchmark.md: workload, data size, machine, and interpretation of results.

Portfolio evidence

Publish the storage contract, decoded page/log examples, crash-recovery transcript, randomized model-test summary, and one benchmark table with limitations.


Source

This tutorial draws from the BYO-X catalog "Database" section. DDIA Chapter 3 and James Smith's "Build Your Own Redis" / "Build Your Own Database" are the canonical primary sources.