Skip to main content

Build Your Own Redis Server

A Redis-like server is where data structures, TCP servers, event loops, timers, and persistence meet. It is not just "a hashmap over sockets." A useful build teaches protocol framing, non-blocking I/O, incremental rehashing, expiration, background work, and the operational cost of keeping data in memory.


1. Overview & motivation

Build a small Redis-compatible server with:

  • a TCP listener and connection loop
  • a framed command protocol
  • string keys and binary-safe values
  • hash-table storage
  • TTL expiration
  • sorted-set or ordered-index extension
  • append-only persistence or snapshotting
  • latency and memory measurements

This project sits between Database (Key-Value) and Web Server: the storage model is simple, but the server behavior is real.


2. Where this fits in the degree

  • Phase: Architecture
  • Semester: 6 (Databases and Distributed Systems), with systems carry-over from Semester 5 networking
  • Modules deepened: Module 2 (storage engines and indexing), Module 3 (replication if extended), Module 4 (durability and consistency if persistence is added)

Cross-phase relevance:

  • Builds on the Systems Network Stack / TCP project or normal OS sockets.
  • Uses hash tables, balanced trees, heaps, and queues from Foundations.
  • Complements Database (Key-Value): Redis optimizes in-memory access and operational latency; disk-first KV stores optimize durability and storage layout.

3. Local source backbone

Primary local source:

  • Build Your Own Redis with C/C++ (build-your-own/redis-james-smith)

Supporting local sources:

  • Build Your Own Database From Scratch (build-your-own/database-james-smith)
  • Build Your Own Web Server From Scratch in Node.js (build-your-own/web-server-node-james-smith)
Local chunksUse them forProject milestone
Redis 001-006TCP byte streams, sockets, protocol boundaries, endian issuesTCP server and binary-safe request framing
Redis 007-010poll, readiness APIs, batching, system-call debuggingEvent loop and request batching
Redis 011-015Command processing, response serialization, progressive hash-table resizingGET, SET, DEL, hash-table core
Redis 016-020Binary formats, balanced trees, sorted-set APIOptional sorted sets or range queries
Redis 021-024Non-blocking I/O, timers, timeout selection, TTL commandsExpiration engine and latency checks
Redis 025-027Client code, semaphores, worker threads, joining background workBackground persistence or slow-command worker pool

4. Implementation milestones

Milestone 1: TCP server and framed protocol

Accept multiple clients. Read complete commands from a byte stream without assuming one recv equals one command.

Evidence: fragmented client writes still produce one correct command.

Milestone 2: Basic commands and response serialization

Implement PING, GET, SET, DEL, and EXISTS. Responses must be deterministic and easy to test.

Evidence: command fixtures cover valid commands, malformed commands, unknown commands, and binary-safe values.

Milestone 3: Hash table with progressive resizing

Build your own chaining hash table or open-addressed table. Do not pause the whole server for a giant resize; move buckets incrementally.

Evidence: load-factor test shows correctness before, during, and after rehashing.

Milestone 4: Event loop and non-blocking I/O

Use poll, epoll, kqueue, IOCP, or a language runtime event loop. Track readable and writable clients separately.

Evidence: one slow client cannot block other clients.

Milestone 5: TTL and timer heap

Add expiration with EXPIRE, TTL, and lazy plus active cleanup.

Evidence: expired keys disappear without scanning the entire database on every request.

Milestone 6: Persistence

Add append-only persistence or snapshotting. Keep the format small and inspectable.

Evidence: restart restores keys; truncating the last partial record does not corrupt the database.

Milestone 7: Sorted-set extension

Implement an ordered structure for ZADD, ZRANGE, and ZRANK.

Evidence: rank and range queries remain correct after insert, update, and delete.


5. Tests & evidence

TestEvidence
Protocol framingcommands split across writes still parse correctly
Concurrent clientsmany clients can read/write without head-of-line blocking
Hash-table resizingrandomized map-model comparison during rehash
TTL correctnessexpiration, update-before-expiry, delete-before-expiry
Persistencewrite, kill, restart, verify values
Latencyp50/p95 under read-heavy, write-heavy, and expiry-heavy workloads

6. Design requirements

Protocol contract

Your server must define where one command ends and the next begins. The central mistake in socket projects is treating TCP as message-oriented. It is a byte stream: a client can send half a command, three commands at once, or one command split across many packets.

Minimum protocol decisions:

  • Framing: length-prefix, RESP subset, or another explicit frame format.
  • Binary safety: keys and values may contain spaces, newlines, or zero bytes if your protocol claims binary support.
  • Error shape: malformed frame, unknown command, wrong arity, invalid integer, and unsupported type should produce different errors.
  • Pipelining: clients may send multiple commands before reading responses.
  • Backpressure: if a client stops reading, the server must not grow an unbounded output buffer.

Event-loop contract

Keep connection state explicit:

Client {
fd
read_buffer
parsed_commands
write_queue
last_active_at
close_after_write
}

The event loop should be able to answer:

  • Which clients are readable?
  • Which clients have pending writes?
  • Which timer expires next?
  • Which background job completed?
  • Which connection should be closed?

Do not hide all of this inside one giant handle_client() function. Redis is interesting because request parsing, command execution, timers, and writes interleave.

Data-structure contract

Start with strings. Then choose one richer structure:

  • Hash table: string key -> object pointer.
  • TTL index: min-heap or ordered set keyed by expiration time.
  • Sorted set: hash table for member lookup plus balanced tree or skip list for score ordering.

For every data structure, define invariants. Example for TTL:

  • A key without expiration is absent from the TTL index.
  • A key with expiration has exactly one TTL entry.
  • Updating expiration replaces the old entry or marks it stale.
  • Expired keys are removed before they are returned by GET.

Durability contract

If you add persistence, state what guarantee you provide:

  • No durability: pure in-memory server; restart loses data.
  • Append-only file: every mutating command is logged before or after execution.
  • Snapshot: periodic full dump; data since last snapshot may be lost.
  • Hybrid: snapshot plus append-only tail.

The README must name the fsync policy. fsync every write is safer and slower; periodic fsync is faster and can lose recent writes.


7. Implementation rubric

LevelEvidence
Minimum passSingle-client server, framed commands, PING, GET, SET, DEL, deterministic tests
Solid passMulti-client event loop, pipelining, hash-table resizing, malformed input tests
Strong passTTL engine, slow-client handling, latency report, restartable persistence
Portfolio-readyRESP-compatible subset, redis-cli smoke test, p95 latency under load, design note with tradeoffs

8. Common failure modes

  • Assuming command boundaries match socket reads. This fails immediately under real clients.
  • Blocking writes. One slow client stalls every other client.
  • Global resize pauses. A big hash-table resize creates latency spikes.
  • TTL by full scan. Scanning all keys each request is simple and wrong under load.
  • Persistence without checksums or truncation logic. A crash during write can leave a partial tail record.
  • No memory policy. An in-memory database must say what happens when memory grows beyond budget.

9. Portfolio framing

Publish it as a server-systems project:

  • README with protocol subset, supported commands, and limitations
  • benchmark report with latency and memory numbers
  • design note on event loop, storage, TTL, and persistence choices
  • tests that a reviewer can run without a real Redis dependency

Reviewer entry points:

  • server or event_loop module
  • command parser and serializer
  • hash-table implementation
  • TTL scheduler
  • persistence/recovery code

10. Deep project spec

Project contract

Build a Redis-like in-memory data server with an explicit protocol subset. The minimum contract is TCP server, framed command parser, PING, GET, SET, DEL, EXISTS, binary-safe values if claimed, multi-client event loop, TTL, and a stated persistence policy. Sorted sets, RESP compatibility, and worker threads are extensions.

Source-backed reading map

Source IDUse forRequired output
build-your-own/redis-james-smithTCP stream handling, event loop, commands, hash table, TTL, persistence/background workprotocol fixtures, event-loop tests, latency report
build-your-own/database-james-smithdurability and storage contrastpersistence/recovery design note
build-your-own/web-server-node-james-smithserver/backpressure comparisonslow-client and buffering tests

Milestone map

MilestoneDeliverableTestsFailure case
Protocolcommand framing and serializersplit-write fixturesmalformed frame
Core commandsPING, GET, SET, DEL, EXISTScommand fixtureswrong arity
Hash tablestorage with resizingmodel comparisonresize during lookup
Event loopmulti-client read/write stateslow-client testunbounded output buffer
TTLexpiration index and lazy/active cleanuptime-controlled testsstale TTL entry
Persistenceappend-only or snapshotrestart teststruncated tail record
Extensionsorted set or RESP/redis-cli smokecompatibility fixturesunsupported type error

Test matrix

Test typeRequired examples
Goldenrequest bytes to response bytes
Modelcommand sequence compared to in-memory reference map
Concurrencymany clients, pipelining, slow reader
TimeTTL expiration, update-before-expiry, delete-before-expiry
Recoverywrite, kill, restart, verify values
Benchmarkp50/p95 under read-heavy, write-heavy, TTL-heavy workloads

Design notes required

  • protocol.md: framing, errors, binary-safety, pipelining.
  • event-loop.md: client state machine and backpressure policy.
  • storage.md: hash-table invariants, TTL index, optional sorted set.
  • durability.md: persistence format, fsync policy, recovery rules.

Portfolio evidence

Publish protocol examples, event-loop diagram, latency table, persistence recovery transcript, and one slow-client or TTL bug fixed by a regression test.


Source

This project is based on the local James Smith Redis chunks and complements the existing key-value database and web-server Build Your Own projects.