Skip to main content

Build Your Own BitTorrent Client

"BitTorrent is one of the most elegant protocols ever designed. Two thousand lines of code can download every Linux ISO ever published." — somewhat optimistic, but the spirit is right

A BitTorrent client is the ideal "real protocol" project: spec is freely published, traffic is unencrypted and legal (download Linux ISOs), peers are abundant, and the algorithm — choking, rarest-first piece selection — is intellectually satisfying.


1. Overview & motivation

BitTorrent is a peer-to-peer file-sharing protocol. A torrent file (.torrent) describes:

  • The file (or files) to download.
  • A list of trackers — coordination servers that know which peers have which pieces.
  • The file split into fixed-size pieces (typically 256 KB), each with a SHA-1 hash.

To download:

  1. Parse the .torrent file (Bencode format).
  2. Ask the tracker for a peer list.
  3. Connect to peers via TCP.
  4. Handshake.
  5. Exchange bitfield messages so each side knows what the other has.
  6. Request pieces. Receive blocks (16 KB chunks of pieces).
  7. Verify SHA-1 of each completed piece.
  8. Save to disk.

What you can only learn by building one:

  • Why rarest-first piece selection is non-obvious but optimal — it keeps the swarm healthy.
  • Why choking and tit-for-tat discourage freeloaders.
  • Why decentralization is hard in practice — trackers are central, magnet links use DHT.
  • Why Bencode is a great example of "design a serialization format in 4 rules."

2. Where this fits in the degree

  • Phase: Systems
  • Semester: 5 (OS and Networking)
  • Modules deepened: Module 5 (network protocols & sockets), Module 3 (concurrency — clients juggle many peers simultaneously).

Cross-phase relevance:


3. Prerequisites

  • Solid in your chosen language (Go, Python, Node.js, C# all work).
  • Comfortable with TCP sockets and bit manipulation.
  • SHA-1: hashlib.sha1(...), used as a black box.
  • HTTP requests (for tracker communication).

You do not need the Network Stack tutorial — using OS sockets is fine. The Network Stack tutorial deepens why sockets work; this tutorial uses them as an API.


4. Theory & research

Required reading

  • Jesse Li, "Building a BitTorrent client from the ground up in Go"blog.jse.li/posts/torrent/. Excellent end-to-end walkthrough, ~1,500 lines of Go. ⭐ recommended primary.
  • Allen Kim, "A BitTorrent client in Python 3.5"markuseliasson.se/article/bittorrent-in-python/.
  • Bram Cohen, "Incentives Build Robustness in BitTorrent" (2003) — the original paper, 5 pages. Read it once.

For the algorithm

  • Cohen, "The BitTorrent Protocol Specification" — official.
  • A. Legout, G. Urvoy-Keller, P. Michiardi, "Understanding BitTorrent: An Experimental Perspective" — measurement study showing why the algorithm works.

What to skip on first pass

  • Magnet links, DHT (Distributed Hash Table). Use plain .torrent files.
  • µTP (BitTorrent's UDP transport). TCP is fine.
  • Encryption (PE/MSE). Doesn't matter for tutorial purposes.

5. Curated tutorial list (from BYO-X)


Jesse Li, "Building a BitTorrent client from the ground up in Go".

Single comprehensive blog post. Each section corresponds to one of the milestones below. Source is on GitHub (github.com/veggiedefender/torrent-client). 1,500 lines of well-organized Go.

You will finish with a working CLI that downloads a real Linux distro torrent.

For Python: Markus Eliasson's tutorial is the equivalent, with somewhat more emphasis on asyncio.


7. Implementation milestones

Milestone 1: Bencode parser

Bencode is BitTorrent's serialization. Four types:

Integers:    i42e
Strings: 4:spam
Lists: l4:spami42ee → ["spam", 42]
Dictionaries: d3:bar4:spame → {"bar": "spam"}

Write a recursive descent parser. Trivial.

def decode(data, i=0):
if data[i] == 'i':
end = data.index('e', i)
return int(data[i+1:end]), end + 1
if data[i].isdigit():
colon = data.index(':', i)
length = int(data[i:colon])
return data[colon+1:colon+1+length], colon + 1 + length
if data[i] == 'l':
result = []; i += 1
while data[i] != 'e':
value, i = decode(data, i); result.append(value)
return result, i + 1
if data[i] == 'd':
result = {}; i += 1
while data[i] != 'e':
key, i = decode(data, i)
value, i = decode(data, i)
result[key] = value
return result, i + 1

Evidence: Parse any .torrent file (debian, ubuntu, archlinux netinstall). Pretty-print the metadata dictionary.

Milestone 2: Parse the torrent file

A .torrent is one Bencoded dictionary with:

  • announce — tracker URL.
  • info — sub-dictionary with name, piece length, pieces (the concatenation of all piece SHA-1s), length (single-file) or files (multi-file).

The infohash is the SHA-1 of the Bencoded info dict. This is the torrent's identity.

info_dict = torrent[b'info']
info_bytes = bencode_encode(info_dict)
infohash = hashlib.sha1(info_bytes).digest()

Evidence: Print infohash; compare with transmission-show foo.torrent. Must match.

Milestone 3: Tracker request

HTTP GET to announce URL with query params:

  • info_hash — URL-encoded 20-byte infohash
  • peer_id — your random 20-byte client ID
  • port6881
  • uploaded, downloaded, left
  • compact=1 — get compact peer list

Response is Bencoded; peers field is a compact list of 6-byte entries (4 bytes IP, 2 bytes port).

Evidence: Get a list of 50–200 peer IP:port pairs for a popular torrent.

Milestone 4: Peer handshake

TCP connect to peer. Send:

pstrlen (1 byte = 19)
pstr ("BitTorrent protocol", 19 bytes)
reserved (8 zero bytes)
infohash (20 bytes)
peer_id (20 bytes)

Receive the same back from the peer. Verify the infohash matches.

Evidence: Open handshakes with 10+ peers. Most will succeed; some will drop.

Milestone 5: Bitfield & interested

After handshake, peers exchange bitfield messages: a bit per piece indicating which pieces they have.

Send interested. Wait for unchoke. (Peers may stay choked. Move on if so.)

Evidence: Receive a bitfield from at least one peer. Send interested. Receive unchoke.

Milestone 6: Request and receive pieces

Pieces are split into 16 KB blocks. Request blocks; receive piece messages.

request: <len=13><id=6><index><begin><length>
piece: <len=9+X><id=7><index><begin><block>

Assemble blocks into pieces. Verify SHA-1 of each piece against the manifest. Discard and re-request mismatches.

Evidence: Download a single piece (e.g., piece 0) and verify it.

Milestone 7: Piece queue and worker model

Run N peer workers concurrently. Each worker pulls a piece index from a shared queue, requests it, verifies it, and pushes the result.

A naive sequential download works but is slow. Concurrency is what makes BitTorrent fast.

type pieceWork struct { index, length int; hash [20]byte }
type pieceResult struct { index int; buf []byte }

// each peer worker:
for work := range workQueue {
if peerHasPiece(work.index) {
buf, err := downloadPiece(work)
if err == nil && checkHash(buf, work.hash) { resultQueue <- pieceResult{...} }
else { workQueue <- work } // requeue
} else {
workQueue <- work
}
}

Evidence: Download an entire small torrent (debian netinstall ISO is ~600 MB; a small package, ~5 MB, is faster for testing). Verify all pieces. Compare downloaded file's SHA against the torrent's.

Milestone 8 (optional): Seeding

Once you have pieces, serve them to other peers who request them. Reverse the role.

Use the Kademlia-based DHT to find peers without a tracker. Far more complex than the rest of the project; skip on a first pass.


8. Tests & evidence

TestHow
Bencode round-tripParse + re-encode any .torrent → identical bytes
InfohashMatches transmission-show output
TrackerReturns a non-empty peer list
HandshakeSuccessful with at least one peer
Piece downloadOne piece downloads and SHA-verifies
Full downloadA small torrent downloads completely; SHA matches
ResilienceConnections drop mid-download; client recovers and finishes
ConcurrencyN parallel peer connections without races

The strongest evidence: the downloaded file's SHA matches the expected value.


9. Common pitfalls

  • Bencode key ordering. Dictionaries in Bencode must be sorted by key when encoding. Forgetting this changes the infohash.
  • URL-encoding the infohash for the tracker. The 20 raw bytes contain non-printable characters. URL-encode each non-alphanumeric byte as %XX.
  • Compact vs non-compact peer list. Specify compact=1 and decode 6-byte entries. Some trackers always return compact.
  • Pieces are not blocks. Pieces are ~256 KB. Blocks are 16 KB. Pieces are SHA-verified; blocks are not. Get the abstraction right.
  • Off-by-one in piece length. The last piece is usually smaller than the others. last_piece_length = total_length % piece_length or full piece length.
  • Holding connections forever. Peers drop silently. Use timeouts on every read.
  • Disk I/O on the hot path. Writing each block to disk synchronously is slow. Buffer pieces in memory; write a piece at a time.
  • Choking ignored. If a peer says they're choking you, requesting more pieces is wasted. Wait for unchoke or move to another peer.

10. Extensions

  • Multi-file torrents. The info dict has a files field instead of length. Stripe pieces across files.
  • Magnet links. Requires DHT (Kademlia) and metadata-exchange extension (BEP 9).
  • Endgame mode. When few pieces remain, request them from every peer at once.
  • Pipelining. Send multiple block requests at a time (5–10) to keep the pipe full.
  • Rarest-first piece selection. Choose pieces in order of rarity, not sequentially. The BitTorrent design.
  • µTP (uTorrent Transport Protocol). UDP-based congestion-controlled transport. Tomorrow's project.
  • WebTorrent. Browser-friendly variant using WebRTC.

11. Module integration

ModuleWhat the BitTorrent client deepens
Sem 5 Module 3 — ConcurrencyThe piece-worker pool is a textbook producer/consumer with retry.
Sem 5 Module 5 — Network protocolsA real, peer-to-peer protocol you talk to peers in the wild.
Network Stack tutorialTCP is the substrate. Knowing TCP from the inside makes BitTorrent clearer.
Kafka-like tutorialBoth are partition-and-replicate systems. Pieces ≈ partitions.
Blockchain tutorialP2P broadcast vs P2P fetch — different patterns, same world.

12. Portfolio framing

What to publish:

  • Source: bencode/, torrent/, tracker/, peer/, download/.
  • README with the SHA verification demo — show that your downloaded file's hash matches the expected.
  • A list of features: tracker, peer protocol, multi-peer concurrency. A list of skipped features: DHT, magnet, seeding, encryption.

What to keep private:

  • The torrents themselves. Use Linux distros and similar legitimate content. Never include or imply copyrighted material.

Reviewer entry points:

  • peer/handshake.go — the protocol entry point.
  • download/manager.go — the worker pool.
  • README must include: legitimate-content disclaimer, SHA verification result, list of features.

A working BitTorrent client is a satisfying portfolio piece because it works in the real world — peers respond, files download, hashes verify. The "it talks to real software" demonstration matters.


Source

This tutorial draws from the BYO-X catalog "BitTorrent Client" section. Jesse Li's blog post and BEP 3 are the canonical primary sources.