Build Your Own Blockchain
A blockchain is a distributed, append-only, hash-chained log with a Sybil-resistant agreement protocol on top.
Building a blockchain is the easiest way to make cryptographic hashing, Merkle trees, peer-to-peer networking, and consensus concrete. The interesting part is not "crypto" in the financial sense -- it is the data structure and the protocol that makes a decentralized ledger work.
1. Overview & motivation
A blockchain has four layers:
- Block format -- a header (previous hash, Merkle root, timestamp, nonce) plus a list of transactions.
- Hash chain -- each block contains the hash of the previous, so tampering invalidates everything after.
- Proof of work / proof of stake -- Sybil resistance: making it costly to propose blocks.
- Peer-to-peer gossip -- nodes flood new blocks and transactions to neighbours.
What you can only learn by building one:
- Why Merkle trees make light clients possible (verify membership with
O(log n)hashes). - Why the longest-chain rule is the elegant heart of Nakamoto consensus.
- Why mining difficulty has to adjust dynamically and how.
- Why double-spend prevention falls out for free if you have hash-chained ordering.
- Why blockchains do not solve most engineering problems -- most of what they offer can be done with Postgres + signatures.
2. Where this fits in the degree
- Phase: Architecture
- Semester: 6 (Databases and Distributed Systems)
- Modules deepened: Module 3 (replication: blockchain is a single-leader-by-PoW replicated log), Module 5 (distributed fundamentals: Nakamoto consensus as an alternative to Paxos/Raft). Also touches Sem 2 Module 5 (Merkle tree as an advanced structure) as a Foundations carry-over.
Cross-phase relevance:
- Builds intuition for content-addressable storage (Git uses similar ideas -- see Git tutorial)
- Direct contrast with the Consensus / Raft tutorial -- different consensus families for different threat models
3. Prerequisites
- Hashing: SHA-256 as a black-box function
bytes -> 32 bytes. - Public-key cryptography: at the API level only -- signing and verifying.
- HTTP or sockets: enough to send JSON between two processes (the Web Server tutorial or Network Stack tutorial gives more than enough).
You do not need to understand the math behind SHA-256 or ECDSA. Use libraries (hashlib, cryptography in Python; crypto/sha256 in Go).
4. Theory & research
Required reading
- Satoshi Nakamoto, "Bitcoin: A Peer-to-Peer Electronic Cash System" (2008) -- bitcoin.org/bitcoin.pdf. 9 pages. The single document. Read it once before writing any code.
- Daniel van Flymen, "Learn Blockchains by Building One" (hackernoon.com/learn-blockchains-by-building-one) -- the canonical Python tutorial. Three hours to follow end to end.
Recommended
- Andreas Antonopoulos, Mastering Bitcoin (2nd edition) -- Chapters 7-10. Free on GitHub.
- Princeton's "Bitcoin and Cryptocurrency Technologies" (coursera.org) -- free course with strong CS foundations.
Useful Merkle-tree references
- Ralph Merkle's original 1987 paper -- for completeness.
- The Certificate Transparency RFC (RFC 6962) -- Merkle trees as used by real systems.
What to skip (for the first pass)
- Ethereum / smart contracts / Solidity. Bitcoin's data structure is the right teaching target.
- Cryptocurrency speculation, "tokenomics", and similar non-CS material.
5. Curated tutorial list (from BYO-X)
- ATS: Functional Blockchain
- Crystal: Write your own blockchain and PoW algorithm using Crystal
- Go: Building Blockchain in Go -- Jeiwan
- Go: Code your own blockchain in less than 200 lines of Go -- Mycoralhealth
- Java: Creating Your First Blockchain with Java -- Kass
- JavaScript: A cryptocurrency implementation in less than 1500 lines of code
- JavaScript: Build your own Blockchain in JavaScript, Learn & Build a JavaScript Blockchain, Creating a blockchain with JavaScript
- JavaScript: How To Launch Your Own Production-Ready Cryptocurrency
- JavaScript: Writing a Blockchain in Node.js
- Kotlin: Let's implement a cryptocurrency in Kotlin
- Python: Learn Blockchains by Building One -- Daniel van Flymen â recommended primary
- Python: Build your own blockchain: a Python tutorial
- Python: A Practical Introduction to Blockchain with Python
- Python: Let's Build the Tiniest Blockchain
- Ruby: Programming Blockchains Step-by-Step (Manuscripts Book Edition)
- Scala: How to build a simple actor-based blockchain
- TypeScript: Naivecoin: a tutorial for building a cryptocurrency -- lhartikk/naivecoin -- substantial, ~500 lines
- TypeScript: NaivecoinStake -- proof of stake variant
- Rust: Building A Blockchain in Rust & Substrate
6. Recommended primary path
Beginner: van Flymen's "Learn Blockchains by Building One" (Python, ~200 lines). One sitting. You will have a working blockchain with mining, transactions, and a simple HTTP API by the end.
Intermediate: Jeiwan's "Building Blockchain in Go" (7-part series). Adds persistence (BoltDB), Merkle trees, P2P networking, and CLI. This is the version most worth completing for portfolio.
Advanced: lhartikk's "Naivecoin" (TypeScript). The most complete tutorial in the catalog. Includes proper UTXO model, wallets, transactions, and a web UI.
For this degree: Jeiwan's Go series, because the Go ecosystem makes the networking and persistence parts natural, and the series is comprehensive without being overwhelming.
7. Implementation milestones
Milestone 1: Block and chain (Day 1, ~50 lines)
A block has: index, timestamp, data, previous hash, current hash. The hash is SHA-256(everything else). A chain is a list of blocks where each block's previous_hash matches the previous block's hash.
import hashlib, json, time
class Block:
def __init__(self, index, prev_hash, data, timestamp=None):
self.index = index
self.timestamp = timestamp or time.time()
self.data = data
self.prev_hash = prev_hash
self.nonce = 0
self.hash = self.compute_hash()
def compute_hash(self):
block_str = json.dumps(
{"i": self.index, "t": self.timestamp, "d": self.data,
"p": self.prev_hash, "n": self.nonce},
sort_keys=True
)
return hashlib.sha256(block_str.encode()).hexdigest()
def genesis():
return Block(0, "0" * 64, "genesis")
Evidence: Build a chain of 3 blocks. Tamper with block 1's data. Show that block 1's hash and every subsequent prev_hash no longer validate.
Milestone 2: Proof of work (Day 2)
A block is valid only if its hash starts with N zero bits. To produce such a hash, vary the nonce until you find one.
def mine(block, difficulty):
target = "0" * difficulty
while not block.hash.startswith(target):
block.nonce += 1
block.hash = block.compute_hash()
return block
Evidence: Mine 5 blocks at difficulty 4. Record nonce values and time per block.
Milestone 3: Transactions and Merkle tree
A block now contains a list of transactions. The header stores the Merkle root -- a hash tree over the transactions. This lets light clients verify "transaction X is in block Y" with O(log n) hashes.
def merkle_root(txs):
if not txs: return ""
level = [hashlib.sha256(t.encode()).hexdigest() for t in txs]
while len(level) > 1:
if len(level) % 2: level.append(level[-1])
level = [hashlib.sha256((a + b).encode()).hexdigest()
for a, b in zip(level[::2], level[1::2])]
return level[0]
Evidence: Build the Merkle tree, produce a Merkle proof for one transaction, and write a verifier that checks the proof without the full block.
Milestone 4: Wallets and signatures
Each user has an ECDSA key pair (secp256k1 is Bitcoin's curve; for a tutorial, any curve works). Transactions are signed by the sender. Receivers / validators verify the signature.
from cryptography.hazmat.primitives.asymmetric import ec
from cryptography.hazmat.primitives import hashes
private_key = ec.generate_private_key(ec.SECP256K1())
public_key = private_key.public_key()
signature = private_key.sign(transaction_bytes, ec.ECDSA(hashes.SHA256()))
public_key.verify(signature, transaction_bytes, ec.ECDSA(hashes.SHA256()))
Evidence: Show that altering a signed transaction fails verification.
Milestone 5: Networking (P2P gossip)
Each node has a list of peers. When a node mines a new block, it broadcasts it. When a node receives a block, it validates it and rebroadcasts.
A simple HTTP-based version works:
POST /blocks-- receive a new blockGET /blocks-- return the current chainPOST /peers-- register a new peerPOST /resolve-- apply longest-chain rule among known peers
Evidence: Run three nodes on three ports. Mine on node 1. Show that nodes 2 and 3 receive the block. Then mine simultaneously on 2 and 3 to create a fork, and show that longest-chain wins.
Milestone 6: Difficulty adjustment
Bitcoin's rule: every 2016 blocks, the network adjusts difficulty so that average block time stays ~10 minutes. In your toy version: adjust every 10 blocks so block time stays at ~5 seconds.
Evidence: Plot block time over 50 blocks under varying mining hash rate. Difficulty should track.
Milestone 7 (optional): UTXO model
Bitcoin doesn't have "accounts" -- it has unspent transaction outputs. Each transaction consumes UTXOs and produces new ones. This is conceptually richer than account-based models (Ethereum).
8. Tests & evidence
| Test | How |
|---|---|
| Hash chain validation | Tamper test (above) |
| Mining correctness | Random data must produce a hash with N leading zeros |
| Merkle proof verification | Generate proof, alter one byte, verify it now fails |
| Signature verification | Sign, verify, then alter and re-verify (must fail) |
| P2P gossip | 3-node fork scenario (above) |
| Difficulty regulation | Block time stays near target across hash-rate change |
| Double-spend prevention | Submit two transactions spending the same UTXO; second must be rejected |
9. Common pitfalls
- Serialization inconsistency. Hash inputs must be reproducible.
json.dumps(..., sort_keys=True)is the simplest fix. Without this, two nodes compute different hashes for the same block. - Including the hash field in the data being hashed. Classic bug. The hash is computed over everything except itself.
- No genesis block agreement. All nodes must agree on the genesis block. Hardcode it.
- Naive
requestscalls in P2P broadcast. Without timeouts, one slow peer blocks the network. Use timeouts and async. - Trusting incoming blocks. Always validate: hash matches, PoW satisfies difficulty, all transactions are signed and don't double-spend.
- Difficulty too high for development. Start at 4 (seconds to mine), not 20 (hours).
10. Extensions
- Proof of stake -- NaivecoinStake tutorial covers it. Replace
mine()with stake-weighted selection. - Persistent storage -- LevelDB, BoltDB, or SQLite. The Jeiwan tutorial uses BoltDB.
- CLI wallet --
send,balance,minecommands. - Smart contracts -- Ethereum-style stack-based VM. This is essentially the Interpreter project applied to a different domain.
- Sharding -- partition the blockchain by transaction key.
11. Module integration
| Module | What the blockchain deepens |
|---|---|
| Sem 2 Module 5 -- Advanced structures | Merkle tree: a specific advanced structure with strong real-world use. |
| Sem 6 Module 2 -- Storage & indexing | Append-only hash-chained log is a textbook storage pattern. |
| Sem 6 Module 3 -- Replication | Blockchain is a replicated log with conflict resolution by longest chain. |
| Sem 6 Module 5 -- Distributed fundamentals | Nakamoto consensus is a probabilistic alternative to Paxos/Raft. |
| Sem 7 Module 4 -- Contract design | Transaction formats and signature schemes are protocol contracts. |
| Cross-link: Consensus / Raft | Direct contrast -- different consensus families. |
| Cross-link: Git tutorial | Git is content-addressable; blockchain is content-addressable. Same fundamental idea applied to two different problems. |
12. Portfolio framing
What to publish:
- The chain implementation with clear
block.py,chain.py,mining.py,wallet.py,network.py. - A README that honestly explains what the project does and what it does not (no production claims).
- A blog post comparing your toy to Bitcoin's actual implementation, naming what's simplified.
What to keep private:
- Private keys (obviously).
- Speculative or financial framing (this is a CS project, not an investment thesis).
Reviewer entry points:
chain.py-- chain validation.mining.py-- PoW + difficulty adjustment.network.py-- P2P broadcast.- README must include the 3-node fork demonstration.
What this should not become:
- A pitch deck.
- An ICO.
- An NFT marketplace.
Keep the project as a CS artifact. The value is in understanding the mechanism, not in playing financial markets.
13. Local source backbone
Use these local chunks to make the blockchain path more rigorous and less tutorial-shaped:
- Build Your Own Blockchain (
build-your-own/blockchain-hellwig) - Learn Blockchain Programming with JavaScript (
build-your-own/blockchain-javascript-traub)
| Local chunks | Use them for | Add to this project |
|---|---|---|
Hellwig 003-010 | DLT terminology, blocks, ledger propagation, Merkle trees, first blockchain build | Add a concept map before implementation: block, transaction, ledger, hash, peer, validator. |
Hellwig 011-023 | Double spending, mining, PoW/PoS/PoA, CAP, Byzantine generals, consensus mechanisms | Add a consensus comparison note: PoW chain selection versus Raft-style leadership. |
Hellwig 024-029 | Ethereum, gas, smart contracts, ABI, sample deployment | Keep as optional extension; do not mix smart contracts into the base chain. |
Hellwig 030-050 | Anonymity, privacy, cryptographic primitives, signatures, elliptic curves | Add a cryptography caveat section: what this toy implements and what it deliberately does not. |
Traub 005-018 | JavaScript blockchain object, blocks, transactions, SHA-256, PoW, genesis block | Add constructor/data-model tests for every chain primitive. |
Traub 019-035 | Express API, mining endpoint, multi-node registration, transaction sync | Add a three-node API demo with peer registration and transaction broadcast. |
Traub 036-048 | Chain validation, consensus endpoint, block explorer, address lookup, improvement queue | Add /consensus, /block/:hash, /transaction/:id, and /address/:address as optional API milestones. |
Extra checkpoints from the book chunks
- Validation checkpoint: corrupt one transaction, one Merkle proof, one block hash, and one previous-hash pointer; all must fail differently.
- Consensus checkpoint: run two nodes that mine competing tips, then a third node that resolves by the chosen rule.
- API checkpoint: expose read-only explorer endpoints without letting clients mutate chain state directly.
- Security checkpoint: document why this toy is not production crypto and list every shortcut.
14. Deep project spec
Project contract
Build a toy blockchain as a distributed-systems mechanism, not a finance product. The base contract is blocks, transactions, hashes, previous-hash links, Merkle roots or transaction commitments, chain validation, peer synchronization, and one consensus rule. Wallets, smart contracts, and explorers are extensions.
Source-backed reading map
| Source ID | Use for | Required output |
|---|---|---|
build-your-own/blockchain-hellwig | DLT concepts, blocks, propagation, Merkle trees, consensus comparison, cryptography caveats | concept map, validation tests, consensus note |
build-your-own/blockchain-javascript-traub | JavaScript blockchain object, Express API, mining, peer registration, explorer endpoints | API demo and multi-node transcript |
Milestone map
| Milestone | Deliverable | Tests | Failure case |
|---|---|---|---|
| Data model | block and transaction schema | serialization/hash fixtures | non-canonical serialization |
| Validation | block, transaction, chain checks | corrupted-chain fixtures | invalid previous hash |
| Mining/consensus | PoW or explicit alternative | difficulty and chain-selection tests | competing tips |
| Merkle/signature layer | commitment or signature proof | proof/signature fixtures | tampered transaction |
| Network API | peer registration and broadcast | three-node demo | duplicate peer/transaction |
| Explorer | read-only lookup endpoints | endpoint fixtures | missing block/transaction |
| Security caveats | threat model and shortcuts | review checklist | production-safety disclaimer missing |
Test matrix
| Test type | Required examples |
|---|---|
| Golden | block hash, transaction ID, Merkle root |
| Negative | corrupt transaction, block hash, previous pointer, consensus state |
| Integration | three nodes broadcast transaction and resolve chain |
| Property | chain validator rejects any single-field mutation |
| API | read-only explorer and mutation endpoints separated |
Design notes required
chain-model.md: block schema, transaction schema, hash inputs.consensus.md: fork-choice rule, difficulty, and failure modes.network.md: peer protocol, idempotency, broadcast behavior.security.md: cryptographic shortcuts and why this is not production crypto.
Portfolio evidence
Publish the chain validator fixtures, three-node transcript, explorer screenshots or curl log, consensus comparison note, and security caveat list.
Source
This tutorial draws from the BYO-X catalog "Blockchain / Cryptocurrency" section. The Nakamoto whitepaper remains the canonical primary source.