Build Your Own BitTorrent Client
"BitTorrent is one of the most elegant protocols ever designed. Two thousand lines of code can download every Linux ISO ever published." — somewhat optimistic, but the spirit is right
A BitTorrent client is the ideal "real protocol" project: spec is freely published, traffic is unencrypted and legal (download Linux ISOs), peers are abundant, and the algorithm — choking, rarest-first piece selection — is intellectually satisfying.
1. Overview & motivation
BitTorrent is a peer-to-peer file-sharing protocol. A torrent file (.torrent) describes:
- The file (or files) to download.
- A list of trackers — coordination servers that know which peers have which pieces.
- The file split into fixed-size pieces (typically 256 KB), each with a SHA-1 hash.
To download:
- Parse the
.torrentfile (Bencode format). - Ask the tracker for a peer list.
- Connect to peers via TCP.
- Handshake.
- Exchange
bitfieldmessages so each side knows what the other has. - Request pieces. Receive blocks (16 KB chunks of pieces).
- Verify SHA-1 of each completed piece.
- Save to disk.
What you can only learn by building one:
- Why rarest-first piece selection is non-obvious but optimal — it keeps the swarm healthy.
- Why choking and tit-for-tat discourage freeloaders.
- Why decentralization is hard in practice — trackers are central, magnet links use DHT.
- Why Bencode is a great example of "design a serialization format in 4 rules."
2. Where this fits in the degree
- Phase: Systems
- Semester: 5 (OS and Networking)
- Modules deepened: Module 5 (network protocols & sockets), Module 3 (concurrency — clients juggle many peers simultaneously).
Cross-phase relevance:
- Sits naturally after the Network Stack tutorial — once TCP is mundane, BitTorrent is just bytes over TCP.
- Conceptual cousin of the Kafka-like distributed log — both are about chunking, ordering, and partial transfer.
3. Prerequisites
- Solid in your chosen language (Go, Python, Node.js, C# all work).
- Comfortable with TCP sockets and bit manipulation.
- SHA-1:
hashlib.sha1(...), used as a black box. - HTTP requests (for tracker communication).
You do not need the Network Stack tutorial — using OS sockets is fine. The Network Stack tutorial deepens why sockets work; this tutorial uses them as an API.
4. Theory & research
Required reading
- BitTorrent Protocol Specification v1.0 (BEP 3) — bittorrent.org/beps/bep_0003.html. The canonical spec. 6 pages. ⭐
- "The BitTorrent Protocol Specification" (unofficial, comprehensive) — wiki.theory.org/index.php/BitTorrentSpecification. The actually-useful reference.
Strongly recommended
- Jesse Li, "Building a BitTorrent client from the ground up in Go" — blog.jse.li/posts/torrent/. Excellent end-to-end walkthrough, ~1,500 lines of Go. ⭐ recommended primary.
- Allen Kim, "A BitTorrent client in Python 3.5" — markuseliasson.se/article/bittorrent-in-python/.
- Bram Cohen, "Incentives Build Robustness in BitTorrent" (2003) — the original paper, 5 pages. Read it once.
For the algorithm
- Cohen, "The BitTorrent Protocol Specification" — official.
- A. Legout, G. Urvoy-Keller, P. Michiardi, "Understanding BitTorrent: An Experimental Perspective" — measurement study showing why the algorithm works.
What to skip on first pass
- Magnet links, DHT (Distributed Hash Table). Use plain
.torrentfiles. - µTP (BitTorrent's UDP transport). TCP is fine.
- Encryption (PE/MSE). Doesn't matter for tutorial purposes.
5. Curated tutorial list (from BYO-X)
- C#: Building a BitTorrent client from scratch in C# — Sander van Vliet's blog series
- Go: Building a BitTorrent client from the ground up in Go — Jesse Li, blog.jse.li ⭐ recommended primary
- Nim: Writing a Bencode Parser
- Node.js: Write your own bittorrent client — Allen Kim
- Python: A BitTorrent client in Python 3.5 — Markus Eliasson, markuseliasson.se
6. Recommended primary path
Jesse Li, "Building a BitTorrent client from the ground up in Go".
Single comprehensive blog post. Each section corresponds to one of the milestones below. Source is on GitHub (github.com/veggiedefender/torrent-client). 1,500 lines of well-organized Go.
You will finish with a working CLI that downloads a real Linux distro torrent.
For Python: Markus Eliasson's tutorial is the equivalent, with somewhat more emphasis on asyncio.
7. Implementation milestones
Milestone 1: Bencode parser
Bencode is BitTorrent's serialization. Four types:
Integers: i42e
Strings: 4:spam
Lists: l4:spami42ee → ["spam", 42]
Dictionaries: d3:bar4:spame → {"bar": "spam"}
Write a recursive descent parser. Trivial.
def decode(data, i=0):
if data[i] == 'i':
end = data.index('e', i)
return int(data[i+1:end]), end + 1
if data[i].isdigit():
colon = data.index(':', i)
length = int(data[i:colon])
return data[colon+1:colon+1+length], colon + 1 + length
if data[i] == 'l':
result = []; i += 1
while data[i] != 'e':
value, i = decode(data, i); result.append(value)
return result, i + 1
if data[i] == 'd':
result = {}; i += 1
while data[i] != 'e':
key, i = decode(data, i)
value, i = decode(data, i)
result[key] = value
return result, i + 1
Evidence: Parse any .torrent file (debian, ubuntu, archlinux netinstall). Pretty-print the metadata dictionary.
Milestone 2: Parse the torrent file
A .torrent is one Bencoded dictionary with:
announce— tracker URL.info— sub-dictionary withname,piece length,pieces(the concatenation of all piece SHA-1s),length(single-file) orfiles(multi-file).
The infohash is the SHA-1 of the Bencoded info dict. This is the torrent's identity.
info_dict = torrent[b'info']
info_bytes = bencode_encode(info_dict)
infohash = hashlib.sha1(info_bytes).digest()
Evidence: Print infohash; compare with transmission-show foo.torrent. Must match.
Milestone 3: Tracker request
HTTP GET to announce URL with query params:
info_hash— URL-encoded 20-byte infohashpeer_id— your random 20-byte client IDport—6881uploaded,downloaded,leftcompact=1— get compact peer list
Response is Bencoded; peers field is a compact list of 6-byte entries (4 bytes IP, 2 bytes port).
Evidence: Get a list of 50–200 peer IP:port pairs for a popular torrent.
Milestone 4: Peer handshake
TCP connect to peer. Send:
pstrlen (1 byte = 19)
pstr ("BitTorrent protocol", 19 bytes)
reserved (8 zero bytes)
infohash (20 bytes)
peer_id (20 bytes)
Receive the same back from the peer. Verify the infohash matches.
Evidence: Open handshakes with 10+ peers. Most will succeed; some will drop.
Milestone 5: Bitfield & interested
After handshake, peers exchange bitfield messages: a bit per piece indicating which pieces they have.
Send interested. Wait for unchoke. (Peers may stay choked. Move on if so.)
Evidence: Receive a bitfield from at least one peer. Send interested. Receive unchoke.
Milestone 6: Request and receive pieces
Pieces are split into 16 KB blocks. Request blocks; receive piece messages.
request: <len=13><id=6><index><begin><length>
piece: <len=9+X><id=7><index><begin><block>
Assemble blocks into pieces. Verify SHA-1 of each piece against the manifest. Discard and re-request mismatches.
Evidence: Download a single piece (e.g., piece 0) and verify it.
Milestone 7: Piece queue and worker model
Run N peer workers concurrently. Each worker pulls a piece index from a shared queue, requests it, verifies it, and pushes the result.
A naive sequential download works but is slow. Concurrency is what makes BitTorrent fast.
type pieceWork struct { index, length int; hash [20]byte }
type pieceResult struct { index int; buf []byte }
// each peer worker:
for work := range workQueue {
if peerHasPiece(work.index) {
buf, err := downloadPiece(work)
if err == nil && checkHash(buf, work.hash) { resultQueue <- pieceResult{...} }
else { workQueue <- work } // requeue
} else {
workQueue <- work
}
}
Evidence: Download an entire small torrent (debian netinstall ISO is ~600 MB; a small package, ~5 MB, is faster for testing). Verify all pieces. Compare downloaded file's SHA against the torrent's.
Milestone 8 (optional): Seeding
Once you have pieces, serve them to other peers who request them. Reverse the role.
Milestone 9 (optional): DHT and magnet links
Use the Kademlia-based DHT to find peers without a tracker. Far more complex than the rest of the project; skip on a first pass.
8. Tests & evidence
| Test | How |
|---|---|
| Bencode round-trip | Parse + re-encode any .torrent → identical bytes |
| Infohash | Matches transmission-show output |
| Tracker | Returns a non-empty peer list |
| Handshake | Successful with at least one peer |
| Piece download | One piece downloads and SHA-verifies |
| Full download | A small torrent downloads completely; SHA matches |
| Resilience | Connections drop mid-download; client recovers and finishes |
| Concurrency | N parallel peer connections without races |
The strongest evidence: the downloaded file's SHA matches the expected value.
9. Common pitfalls
- Bencode key ordering. Dictionaries in Bencode must be sorted by key when encoding. Forgetting this changes the infohash.
- URL-encoding the infohash for the tracker. The 20 raw bytes contain non-printable characters. URL-encode each non-alphanumeric byte as
%XX. - Compact vs non-compact peer list. Specify
compact=1and decode 6-byte entries. Some trackers always return compact. - Pieces are not blocks. Pieces are ~256 KB. Blocks are 16 KB. Pieces are SHA-verified; blocks are not. Get the abstraction right.
- Off-by-one in piece length. The last piece is usually smaller than the others.
last_piece_length = total_length % piece_lengthor full piece length. - Holding connections forever. Peers drop silently. Use timeouts on every read.
- Disk I/O on the hot path. Writing each block to disk synchronously is slow. Buffer pieces in memory; write a piece at a time.
- Choking ignored. If a peer says they're choking you, requesting more pieces is wasted. Wait for unchoke or move to another peer.
10. Extensions
- Multi-file torrents. The
infodict has afilesfield instead oflength. Stripe pieces across files. - Magnet links. Requires DHT (Kademlia) and metadata-exchange extension (BEP 9).
- Endgame mode. When few pieces remain, request them from every peer at once.
- Pipelining. Send multiple block requests at a time (5–10) to keep the pipe full.
- Rarest-first piece selection. Choose pieces in order of rarity, not sequentially. The BitTorrent design.
- µTP (uTorrent Transport Protocol). UDP-based congestion-controlled transport. Tomorrow's project.
- WebTorrent. Browser-friendly variant using WebRTC.
11. Module integration
| Module | What the BitTorrent client deepens |
|---|---|
| Sem 5 Module 3 — Concurrency | The piece-worker pool is a textbook producer/consumer with retry. |
| Sem 5 Module 5 — Network protocols | A real, peer-to-peer protocol you talk to peers in the wild. |
| Network Stack tutorial | TCP is the substrate. Knowing TCP from the inside makes BitTorrent clearer. |
| Kafka-like tutorial | Both are partition-and-replicate systems. Pieces ≈ partitions. |
| Blockchain tutorial | P2P broadcast vs P2P fetch — different patterns, same world. |
12. Portfolio framing
What to publish:
- Source:
bencode/,torrent/,tracker/,peer/,download/. - README with the SHA verification demo — show that your downloaded file's hash matches the expected.
- A list of features: tracker, peer protocol, multi-peer concurrency. A list of skipped features: DHT, magnet, seeding, encryption.
What to keep private:
- The torrents themselves. Use Linux distros and similar legitimate content. Never include or imply copyrighted material.
Reviewer entry points:
peer/handshake.go— the protocol entry point.download/manager.go— the worker pool.- README must include: legitimate-content disclaimer, SHA verification result, list of features.
A working BitTorrent client is a satisfying portfolio piece because it works in the real world — peers respond, files download, hashes verify. The "it talks to real software" demonstration matters.
Source
This tutorial draws from the BYO-X catalog "BitTorrent Client" section. Jesse Li's blog post and BEP 3 are the canonical primary sources.