Skip to main content

Build Your Own Kafka-like Distributed Log

"Kafka is just a distributed, partitioned, replicated, totally-ordered log."

Building a Kafka-style distributed log is the cleanest way to learn the mechanics of replicated systems: partitioning, leader election, replication protocols, consumer offsets, retention. Plan for several weeks; the reward is a working pub/sub system you understand from disk format up.


1. Overview & motivation

A distributed log has:

  • Topics — named streams of records.
  • Partitions — each topic is split across N partitions for parallelism.
  • Replication — each partition has 1 leader + N-1 followers.
  • Producers — write records to partitions (chosen by hash or round-robin).
  • Consumers — read records sequentially; track their position (offset).
  • Brokers — server processes holding partition data.

What you can only learn by building one:

  • Why single-writer-per-partition is the design that makes everything else tractable.
  • Why the log abstraction (immutable, append-only, ordered) is so powerful — it shows up in databases (WAL), filesystems (LSM SSTables), version control (commits), event sourcing, blockchains.
  • Why consumer offsets stored externally unlock so much (replay, multiple consumers, time travel).
  • Why leader election is the hard part of any replicated system — and a perfect lead-in to Raft.

2. Where this fits in the degree

  • Phase: Architecture
  • Semester: 6 (Databases and Distributed Systems)
  • Modules deepened: Module 3 (replication & partitioning) — the project for this module. Module 5 (distributed fundamentals) — leader election, failure detection.

Cross-phase relevance:


3. Prerequisites

  • Complete the Database (KV) tutorial — you need to be comfortable with on-disk file formats and WAL-style append-only logs.
  • Strong systems language: Go, Java, Rust.
  • TCP socket programming.

4. Theory & research

Required reading

  • Travis Jeffery, "Distributed Services with Go: Your Guide to Reliable, Scalable, and Maintainable Systems" — book that builds a Kafka-like service step-by-step. Excellent practical resource. ⭐ recommended primary.
  • Kreps, Narkhede, Rao, "Kafka: a Distributed Messaging System for Log Processing" (NetDB 2011) — the original paper. 6 pages. Read once.
  • Confluent's Kafka design docs — well-written internals.

For consensus and replication mechanics

  • Diego Ongaro & John Ousterhout, "In Search of an Understandable Consensus Algorithm" (Raft paper, 2014) — the canonical Raft reference. See the Consensus tutorial.
  • Apache Kafka's KRaft protocol — Kafka's recent move from ZooKeeper-based to Raft-based metadata management.

5. Curated tutorial list (from BYO-X)

The BYO-X catalog has one entry under "Distributed Systems":

  • Java: Building Your Own Kafka-like System From Scratch: A Step-by-Step Guide

Additional canonical references

  • Travis Jeffery, "Distributed Services with Go" (book) — full project, ~300 pages.
  • CodeCrafters "Build Your Own Kafka" course — paid, but high-quality.
  • JetBrains, "Building a Kafka-style log from scratch in Rust" — various community tutorials.

Travis Jeffery, "Distributed Services with Go" (book).

Builds a complete distributed commit log. Chapters cover:

  1. Single-server commit log.
  2. gRPC API.
  3. Replication (with Raft via Hashicorp's library).
  4. Service discovery (Serf).
  5. Client / observability / deployment.

The strength: it builds an operationally complete system, not just a toy. Realistic shape, real protocols.

For a more from-scratch path: pick the Java tutorial (BYO-X) for high-level shape, then implement in your language while reading the original Kafka paper for the design rationale.


7. Implementation milestones

Milestone 1: Single-broker single-partition log

A file on disk. Records are length-prefixed. Position by byte offset.

type Log struct {
mu sync.Mutex
file *os.File
offset uint64
}

func (l *Log) Append(record []byte) (offset uint64, err error) {
l.mu.Lock()
defer l.mu.Unlock()
off := l.offset
binary.Write(l.file, binary.BigEndian, uint32(len(record)))
l.file.Write(record)
l.offset += uint64(4 + len(record))
return off, nil
}

func (l *Log) Read(offset uint64) (record []byte, err error) {
// seek to offset, read length prefix, read record
}

Evidence: Append 1M records. Read them back at random offsets.

Milestone 2: Segments and retention

Split the log into segments (files of bounded size). Old segments can be deleted (retention policy) or compacted (compact-on-key, like Kafka's compacted topics).

Maintain an index file per segment mapping offset → file position for fast seek.

000000.log    (records, 0 to 99)
000000.index (offset -> byte position)
000100.log (records, 100 to 199)
000100.index
000200.log (active, currently being written)
000200.index

Evidence: Records can be read efficiently regardless of segment they're in. Old segments deleted after configured retention time.

Milestone 3: Network protocol (RPC)

Define an RPC interface. Either:

  • gRPC with protobuf schemas (Travis Jeffery's path).
  • A custom binary protocol (Kafka's path; more educational).
service Log {
rpc Produce(ProduceRequest) returns (ProduceResponse);
rpc Consume(ConsumeRequest) returns (stream Record);
}

Evidence: A client can produce and consume records over the network.

Milestone 4: Multiple partitions

A topic has N partitions. Producer chooses partition by hash(key) or round-robin. Each partition is an independent log.

Evidence: Records with the same key go to the same partition (ordering guarantee). Records with different keys parallelize across partitions.

Milestone 5: Consumer groups and offsets

A consumer group is a set of consumers cooperating to consume a topic. Each partition is assigned to one consumer in the group. Consumers persist their committed offset.

Topic: orders (3 partitions)
Consumer Group: order-processors
consumer-1: partitions [0, 1]
consumer-2: partition [2]

Each consumer commits its offset after processing.

Evidence: Multiple consumers in a group each process a subset of partitions. After restart, they resume from their committed offset.

Milestone 6: Replication

Each partition has a leader + N-1 followers. Followers replicate from the leader.

Producer sends to leader → leader appends to local log → leader pushes to followers → followers append to their local logs → leader acks the producer once N replicas have the record.

Choose your replication mode:

  • Sync — wait for all in-sync replicas. Safest, slowest.
  • Async — leader acks immediately, replicates in background. Fastest, risk of loss.
  • Quorum (Kafka's acks=all, min.insync.replicas=N) — wait for at least N replicas.

Evidence: Killing a follower doesn't lose writes. Killing the leader causes failover (next milestone).

Milestone 7: Leader election

When the leader fails, one follower must become the new leader. This requires consensus on which follower has the most data.

Options:

  • Hashicorp's Raft library (Go) — Travis Jeffery uses this. Quickest path.
  • ZooKeeper — what original Kafka used. Adds an external dependency.
  • Implement Raft yourself — see the Consensus tutorial. The educational path.

Evidence: Kill the leader. Within seconds, a new leader is elected. Clients reconnect to the new leader. No data loss for already-acked writes.

Milestone 8: Cluster membership and discovery

Brokers find each other. Use a gossip protocol (Hashicorp Serf), DNS, or static configuration.

Milestone 9 (optional): Compaction

For "compacted topics" — keep only the latest value per key. Useful for changelog topics.


8. Tests & evidence

TestHow
Single broker correctnessProduce and consume in order; offsets stable
PersistenceRestart broker; messages preserved
Partition parallelismTwo keys → two partitions → both processed concurrently
Consumer offsetsRestart consumer; resumes from committed offset
Replication consistencyAfter replication, all replicas have identical logs
Leader failoverKill leader; new leader elected; no data loss
Network partitionHalf the cluster isolated; only majority side accepts writes
Throughput100k msg/sec on a 3-broker cluster (target; achievable)

The strongest single evidence: a chaos test transcript — kill brokers randomly, verify no data loss, all producers and consumers eventually make progress.


9. Common pitfalls

  • Single-writer-per-partition violation. The whole design depends on a single writer per partition. If two brokers both think they're leader (split brain), data diverges. Leader election must be correct.
  • Offset semantics. Kafka's offsets are log offsets, not message IDs. Be precise.
  • Consumer rebalancing. When a consumer joins or leaves, partitions must be reassigned. Naive rebalancing causes thundering herd. Use Kafka's "incremental cooperative rebalancing" pattern.
  • Truncating uncommitted entries. When a new leader is elected, followers may have to truncate ahead-of-the-new-leader's-position. This is the source of many bugs.
  • No backpressure. A fast producer can overwhelm a slow consumer. Either implement backpressure or accept lag.
  • fsync on every write. Slow. Group commits.
  • Skipping the log compaction edge cases. Compaction in the presence of in-flight readers requires careful design.

10. Extensions

  • Exactly-once semantics — Kafka calls this EOS. Idempotent producers + transactional writes.
  • Stream processing — Kafka Streams. Build a tiny version on top.
  • MirrorMaker — replicate between clusters. Useful for DR.
  • Schema registry — enforce schemas on records.
  • Connect framework — pluggable source and sink connectors.

A real Kafka cluster has been refined over a decade. Don't try to reproduce all of it.


11. Module integration

ModuleWhat the distributed log deepens
Sem 6 Module 2 — Storage & indexingSegment + index file format is a classic on-disk design.
Sem 6 Module 3 — Replication & partitioningThe whole project.
Sem 6 Module 5 — Distributed fundamentalsLeader election, failure detection, consensus.
Database (KV) tutorialLSM and the commit log share the same "append + compact" pattern.
Consensus / Raft tutorialLeader election is what Raft solves. Build this and you'll feel why Raft exists.

12. Portfolio framing

What to publish:

  • Source organized as log/, server/, replication/, consensus/.
  • A chaos-test report — what failure scenarios you ran, what survived.
  • A README with:
    • Architecture diagram (brokers, partitions, replication).
    • Throughput and latency benchmark numbers.
    • List of what's implemented vs Kafka.
    • One operational tradeoff you made (e.g., chose async replication and document why).

Reviewer entry points:

  • log/segment.go — on-disk format.
  • replication/leader.go — replication path.
  • consensus/raft.go — leader election.
  • tests/chaos.md — the chaos test plan and results.
  • README must include: architecture diagram, benchmark numbers, scope honesty.

A working distributed log is a flagship portfolio piece. Sets you up to talk seriously about Kafka, Pulsar, replicated systems in interviews.


Source

This tutorial draws from the BYO-X catalog "Distributed Systems" entry. Jay Kreps' "The Log" essay, the Kafka paper, and Travis Jeffery's Distributed Services with Go are the canonical primary sources.