Binary Search Trees and the BST Property
What This Concept Is
A binary search tree (BST) is a binary tree where every node stores a key and the structure satisfies the BST property:
- every key in the left subtree of a node
xis<= x.key - every key in the right subtree of a node
xis>= x.key
That property is the only structural promise a plain BST makes. It is enough to support search, insert, delete, minimum, maximum, successor, and predecessor in O(h) time where h is the height of the tree.
The catch is that h is not controlled. On n sorted insertions, a plain BST degenerates into a path and h = n. The BST is fundamentally an order-aware dictionary: unlike a hash table, it supports successor, predecessor, range scans, and ordered traversal, but it pays for those with its dependence on tree shape.
A second subtlety is that the BST property is local but transitive. It is stated at each node, yet it implies a global sort order: the in-order traversal (left subtree, node, right subtree) visits keys in non-decreasing order precisely because the property holds at every node simultaneously. This is why in-order traversal is used everywhere from std::map iteration to compiler symbol dumps.
Finally, BSTs admit pointers between siblings, predecessors, and successors without changing the core property. Threaded trees, augmented BSTs (for order statistics or interval queries), and rank-balanced BSTs all inherit this same API and the same O(h) cost model. Fixing h is what the rest of this cluster is about.
Why It Matters Here
Every balanced tree in this cluster exists to fix one problem: the plain BST shape depends on insertion order, not on the keys themselves. Before you can appreciate why red-black and AVL trees are worth their complexity, you have to see the failure mode they prevent.
It also sets up vocabulary you need for the rest of the module:
- rotation is a local restructuring that preserves the BST property
- in-order traversal always produces sorted output
- successor/predecessor are
O(h)using parent pointers or ancestor tracking - augmentation (subtree size, min/max, sums) is possible as long as it is recomputable in
O(1)at each node
Concrete Examples
Example 1 -- sorted insertion degenerates. Insert the keys 1, 2, 3, 4, 5 into an empty BST in that order:
1
\
2
\
3
\
4
\
5
Height is 4, and search(5) takes five comparisons. The same keys inserted in order 3, 1, 4, 2, 5 give:
3
/ \
1 4
\ \
2 5
Height is 2. Same keys, same BST property, different shape, different cost.
Example 2 -- deletion by successor replacement. Starting from the balanced tree above, delete 3. Since 3 has two children, find the in-order successor: 4 (the minimum of the right subtree). Copy 4's key into the node that held 3, then delete the successor node in its original position (a one-child or leaf case). Result:
4
/ \
1 5
\
2
Height is still 2 and in-order traversal still reads 1, 2, 4, 5. The two-children delete reduces to a simpler delete by the successor-copy trick.
Common Confusion / Misconceptions
"The BST property is just about the immediate children." Wrong. It is pointwise on every node, which by induction implies every descendant on the left is <= and every descendant on the right is >=. A tree that passes only the parent-child check can still violate the BST property globally.
"Two equal keys cannot coexist." Depends on the variant. Multiset BSTs (e.g., std::multimap) allow duplicates using a consistent tiebreak (always go right for equal keys). Set BSTs (std::set) reject duplicates on insert. State your variant before debating correctness.
"In-order traversal is a special algorithm." It is just the three-line recursion in_order(left); visit(node); in_order(right) and it runs in O(n) time and O(h) extra stack space. Any discussion of Morris traversal, threaded BSTs, or iterator state is an optimization of the stack, not of the logical order.
"Deletion is symmetric to insertion." It is not. Insert always adds a leaf; delete has three cases (leaf, one child, two children) and only the two-children case needs the successor trick.
How To Use It
Before using a BST in code:
- ask whether insertion order is adversarial or even merely sorted
- if yes, use a balanced variant or a randomized replacement (see concepts 02-04)
- in-order traversal gives sorted output "for free" in
O(n)time - successor and predecessor are the cheap operations that ordered maps rely on
- deletion has three cases: leaf, one child, two children (replace with successor)
- if you need rank or order statistics, augment every node with subtree size and update on insert/delete
- for range scans, combine
successorwith an early-exit predicate -- do not re-search from the root
Transfer / Where This Shows Up Later
- S5 (OS / filesystems): the directory trie in Unix filesystems and the symbol tables of a linker are BST-shaped.
readdiris an in-order traversal. Process scheduling in the Linux CFS uses a red-black tree (a BST) keyed by virtual runtime. - S6 (databases): every ordered index in an OLTP database is a BST generalization -- B+trees for disk, skip lists for memtables (RocksDB, LevelDB), LSM trees whose sorted runs are materialized BSTs on disk.
- S7 (architecture): "use an ordered index or a hash" is a recurring ADR. Knowing why a BST exposes
successorbut a hash does not lets you defend the choice. - S8 (system design at scale): range shards in distributed systems (Spanner, CockroachDB) use interval trees and BSTs to route keys to partitions. Range queries rely on in-order traversal of a sharded BST spanning the cluster.
Check Yourself
- What is the BST property, stated precisely?
- Which insertion orders produce the worst shape? Which produce the best?
- Why does in-order traversal produce sorted output?
- How does deletion of a node with two children reduce to deletion of a node with at most one child?
- What does
successor(x)do whenxhas no right subtree, and what extra pointers make itO(h)? - If you augment every node with
size = 1 + size(left) + size(right), how do you answerselect(k)(thek-th smallest key) inO(h)?
Mini Drill or Application
Using pencil and paper:
- Insert
5, 3, 8, 1, 4, 7, 9, 2, 6into an empty BST. Draw it. - Perform
delete(5)using successor replacement. Draw the result. - Insert the same keys in sorted order into a new BST. Compare heights.
- For each of the two trees above, write the in-order traversal and confirm it equals the sorted key list.
- Implement
BSTin Python or C++ withinsert,search,delete, andin_order. Test on 10,000 random keys and 10,000 sorted keys; print the measured height and compare tolog2(n)andn.
Read This Only If Stuck
- CLRS: What is a binary search tree
- CLRS: Querying a binary search tree
- CLRS: Insertion and deletion (Part 1)
- CLRS: Insertion and deletion (Part 3)
- Sedgewick: Elementary searching methods (Part 1)
- Sedgewick: Elementary searching methods (Part 2)
- Skiena: 3.4 Binary search trees
- Pat Morin, Open Data Structures -- BinarySearchTree chapter
- MIT 6.851 Advanced Data Structures lecture index