Build Your Own Kafka-like Distributed Log
"Kafka is just a distributed, partitioned, replicated, totally-ordered log."
Building a Kafka-style distributed log is the cleanest way to learn the mechanics of replicated systems: partitioning, leader election, replication protocols, consumer offsets, retention. Plan for several weeks; the reward is a working pub/sub system you understand from disk format up.
1. Overview & motivation
A distributed log has:
- Topics — named streams of records.
- Partitions — each topic is split across N partitions for parallelism.
- Replication — each partition has 1 leader + N-1 followers.
- Producers — write records to partitions (chosen by hash or round-robin).
- Consumers — read records sequentially; track their position (offset).
- Brokers — server processes holding partition data.
What you can only learn by building one:
- Why single-writer-per-partition is the design that makes everything else tractable.
- Why the log abstraction (immutable, append-only, ordered) is so powerful — it shows up in databases (WAL), filesystems (LSM SSTables), version control (commits), event sourcing, blockchains.
- Why consumer offsets stored externally unlock so much (replay, multiple consumers, time travel).
- Why leader election is the hard part of any replicated system — and a perfect lead-in to Raft.
2. Where this fits in the degree
- Phase: Architecture
- Semester: 6 (Databases and Distributed Systems)
- Modules deepened: Module 3 (replication & partitioning) — the project for this module. Module 5 (distributed fundamentals) — leader election, failure detection.
Cross-phase relevance:
- Sequel to the Database (KV) tutorial — partition KV → distributed log.
- Sets up the Consensus / Raft tutorial — by now you'll see why consensus matters.
- Connects to the BitTorrent tutorial — both are partitioned-and-replicated systems.
3. Prerequisites
- Complete the Database (KV) tutorial — you need to be comfortable with on-disk file formats and WAL-style append-only logs.
- Strong systems language: Go, Java, Rust.
- TCP socket programming.
4. Theory & research
Required reading
- Jay Kreps, "The Log: What every software engineer should know about real-time data's unifying abstraction" (engineering.linkedin.com/distributed-systems/log-what-every-software-engineer-should-know-about-real-time-datas-unifying) — the foundational essay by Kafka's co-creator. ⭐ start here.
- Apache Kafka documentation — kafka.apache.org/documentation. Read the "Design" section.
Strongly recommended
- Travis Jeffery, "Distributed Services with Go: Your Guide to Reliable, Scalable, and Maintainable Systems" — book that builds a Kafka-like service step-by-step. Excellent practical resource. ⭐ recommended primary.
- Kreps, Narkhede, Rao, "Kafka: a Distributed Messaging System for Log Processing" (NetDB 2011) — the original paper. 6 pages. Read once.
- Confluent's Kafka design docs — well-written internals.
For consensus and replication mechanics
- Diego Ongaro & John Ousterhout, "In Search of an Understandable Consensus Algorithm" (Raft paper, 2014) — the canonical Raft reference. See the Consensus tutorial.
- Apache Kafka's KRaft protocol — Kafka's recent move from ZooKeeper-based to Raft-based metadata management.
5. Curated tutorial list (from BYO-X)
The BYO-X catalog has one entry under "Distributed Systems":
- Java: Building Your Own Kafka-like System From Scratch: A Step-by-Step Guide
Additional canonical references
- Travis Jeffery, "Distributed Services with Go" (book) — full project, ~300 pages.
- CodeCrafters "Build Your Own Kafka" course — paid, but high-quality.
- JetBrains, "Building a Kafka-style log from scratch in Rust" — various community tutorials.
6. Recommended primary path
Travis Jeffery, "Distributed Services with Go" (book).
Builds a complete distributed commit log. Chapters cover:
- Single-server commit log.
- gRPC API.
- Replication (with Raft via Hashicorp's library).
- Service discovery (Serf).
- Client / observability / deployment.
The strength: it builds an operationally complete system, not just a toy. Realistic shape, real protocols.
For a more from-scratch path: pick the Java tutorial (BYO-X) for high-level shape, then implement in your language while reading the original Kafka paper for the design rationale.
7. Implementation milestones
Milestone 1: Single-broker single-partition log
A file on disk. Records are length-prefixed. Position by byte offset.
type Log struct {
mu sync.Mutex
file *os.File
offset uint64
}
func (l *Log) Append(record []byte) (offset uint64, err error) {
l.mu.Lock()
defer l.mu.Unlock()
off := l.offset
binary.Write(l.file, binary.BigEndian, uint32(len(record)))
l.file.Write(record)
l.offset += uint64(4 + len(record))
return off, nil
}
func (l *Log) Read(offset uint64) (record []byte, err error) {
// seek to offset, read length prefix, read record
}
Evidence: Append 1M records. Read them back at random offsets.
Milestone 2: Segments and retention
Split the log into segments (files of bounded size). Old segments can be deleted (retention policy) or compacted (compact-on-key, like Kafka's compacted topics).
Maintain an index file per segment mapping offset → file position for fast seek.
000000.log (records, 0 to 99)
000000.index (offset -> byte position)
000100.log (records, 100 to 199)
000100.index
000200.log (active, currently being written)
000200.index
Evidence: Records can be read efficiently regardless of segment they're in. Old segments deleted after configured retention time.
Milestone 3: Network protocol (RPC)
Define an RPC interface. Either:
- gRPC with protobuf schemas (Travis Jeffery's path).
- A custom binary protocol (Kafka's path; more educational).
service Log {
rpc Produce(ProduceRequest) returns (ProduceResponse);
rpc Consume(ConsumeRequest) returns (stream Record);
}
Evidence: A client can produce and consume records over the network.
Milestone 4: Multiple partitions
A topic has N partitions. Producer chooses partition by hash(key) or round-robin. Each partition is an independent log.
Evidence: Records with the same key go to the same partition (ordering guarantee). Records with different keys parallelize across partitions.
Milestone 5: Consumer groups and offsets
A consumer group is a set of consumers cooperating to consume a topic. Each partition is assigned to one consumer in the group. Consumers persist their committed offset.
Topic: orders (3 partitions)
Consumer Group: order-processors
consumer-1: partitions [0, 1]
consumer-2: partition [2]
Each consumer commits its offset after processing.
Evidence: Multiple consumers in a group each process a subset of partitions. After restart, they resume from their committed offset.
Milestone 6: Replication
Each partition has a leader + N-1 followers. Followers replicate from the leader.
Producer sends to leader → leader appends to local log → leader pushes to followers → followers append to their local logs → leader acks the producer once N replicas have the record.
Choose your replication mode:
- Sync — wait for all in-sync replicas. Safest, slowest.
- Async — leader acks immediately, replicates in background. Fastest, risk of loss.
- Quorum (Kafka's
acks=all,min.insync.replicas=N) — wait for at least N replicas.
Evidence: Killing a follower doesn't lose writes. Killing the leader causes failover (next milestone).
Milestone 7: Leader election
When the leader fails, one follower must become the new leader. This requires consensus on which follower has the most data.
Options:
- Hashicorp's Raft library (Go) — Travis Jeffery uses this. Quickest path.
- ZooKeeper — what original Kafka used. Adds an external dependency.
- Implement Raft yourself — see the Consensus tutorial. The educational path.
Evidence: Kill the leader. Within seconds, a new leader is elected. Clients reconnect to the new leader. No data loss for already-acked writes.
Milestone 8: Cluster membership and discovery
Brokers find each other. Use a gossip protocol (Hashicorp Serf), DNS, or static configuration.
Milestone 9 (optional): Compaction
For "compacted topics" — keep only the latest value per key. Useful for changelog topics.
8. Tests & evidence
| Test | How |
|---|---|
| Single broker correctness | Produce and consume in order; offsets stable |
| Persistence | Restart broker; messages preserved |
| Partition parallelism | Two keys → two partitions → both processed concurrently |
| Consumer offsets | Restart consumer; resumes from committed offset |
| Replication consistency | After replication, all replicas have identical logs |
| Leader failover | Kill leader; new leader elected; no data loss |
| Network partition | Half the cluster isolated; only majority side accepts writes |
| Throughput | 100k msg/sec on a 3-broker cluster (target; achievable) |
The strongest single evidence: a chaos test transcript — kill brokers randomly, verify no data loss, all producers and consumers eventually make progress.
9. Common pitfalls
- Single-writer-per-partition violation. The whole design depends on a single writer per partition. If two brokers both think they're leader (split brain), data diverges. Leader election must be correct.
- Offset semantics. Kafka's offsets are log offsets, not message IDs. Be precise.
- Consumer rebalancing. When a consumer joins or leaves, partitions must be reassigned. Naive rebalancing causes thundering herd. Use Kafka's "incremental cooperative rebalancing" pattern.
- Truncating uncommitted entries. When a new leader is elected, followers may have to truncate ahead-of-the-new-leader's-position. This is the source of many bugs.
- No backpressure. A fast producer can overwhelm a slow consumer. Either implement backpressure or accept lag.
fsyncon every write. Slow. Group commits.- Skipping the log compaction edge cases. Compaction in the presence of in-flight readers requires careful design.
10. Extensions
- Exactly-once semantics — Kafka calls this EOS. Idempotent producers + transactional writes.
- Stream processing — Kafka Streams. Build a tiny version on top.
- MirrorMaker — replicate between clusters. Useful for DR.
- Schema registry — enforce schemas on records.
- Connect framework — pluggable source and sink connectors.
A real Kafka cluster has been refined over a decade. Don't try to reproduce all of it.
11. Module integration
| Module | What the distributed log deepens |
|---|---|
| Sem 6 Module 2 — Storage & indexing | Segment + index file format is a classic on-disk design. |
| Sem 6 Module 3 — Replication & partitioning | The whole project. |
| Sem 6 Module 5 — Distributed fundamentals | Leader election, failure detection, consensus. |
| Database (KV) tutorial | LSM and the commit log share the same "append + compact" pattern. |
| Consensus / Raft tutorial | Leader election is what Raft solves. Build this and you'll feel why Raft exists. |
12. Portfolio framing
What to publish:
- Source organized as
log/,server/,replication/,consensus/. - A chaos-test report — what failure scenarios you ran, what survived.
- A README with:
- Architecture diagram (brokers, partitions, replication).
- Throughput and latency benchmark numbers.
- List of what's implemented vs Kafka.
- One operational tradeoff you made (e.g., chose async replication and document why).
Reviewer entry points:
log/segment.go— on-disk format.replication/leader.go— replication path.consensus/raft.go— leader election.tests/chaos.md— the chaos test plan and results.- README must include: architecture diagram, benchmark numbers, scope honesty.
A working distributed log is a flagship portfolio piece. Sets you up to talk seriously about Kafka, Pulsar, replicated systems in interviews.
Source
This tutorial draws from the BYO-X catalog "Distributed Systems" entry. Jay Kreps' "The Log" essay, the Kafka paper, and Travis Jeffery's Distributed Services with Go are the canonical primary sources.