Skip to main content

Binary Search Trees and the BST Property

What This Concept Is

A binary search tree (BST) is a binary tree where every node stores a key and the structure satisfies the BST property:

  • every key in the left subtree of a node x is <= x.key
  • every key in the right subtree of a node x is >= x.key

That property is the only structural promise a plain BST makes. It is enough to support search, insert, delete, minimum, maximum, successor, and predecessor in O(h) time where h is the height of the tree.

The catch is that h is not controlled. On n sorted insertions, a plain BST degenerates into a path and h = n. The BST is fundamentally an order-aware dictionary: unlike a hash table, it supports successor, predecessor, range scans, and ordered traversal, but it pays for those with its dependence on tree shape.

A second subtlety is that the BST property is local but transitive. It is stated at each node, yet it implies a global sort order: the in-order traversal (left subtree, node, right subtree) visits keys in non-decreasing order precisely because the property holds at every node simultaneously. This is why in-order traversal is used everywhere from std::map iteration to compiler symbol dumps.

Finally, BSTs admit pointers between siblings, predecessors, and successors without changing the core property. Threaded trees, augmented BSTs (for order statistics or interval queries), and rank-balanced BSTs all inherit this same API and the same O(h) cost model. Fixing h is what the rest of this cluster is about.

Why It Matters Here

Every balanced tree in this cluster exists to fix one problem: the plain BST shape depends on insertion order, not on the keys themselves. Before you can appreciate why red-black and AVL trees are worth their complexity, you have to see the failure mode they prevent.

It also sets up vocabulary you need for the rest of the module:

  • rotation is a local restructuring that preserves the BST property
  • in-order traversal always produces sorted output
  • successor/predecessor are O(h) using parent pointers or ancestor tracking
  • augmentation (subtree size, min/max, sums) is possible as long as it is recomputable in O(1) at each node

Concrete Examples

Example 1 -- sorted insertion degenerates. Insert the keys 1, 2, 3, 4, 5 into an empty BST in that order:

1
\
2
\
3
\
4
\
5

Height is 4, and search(5) takes five comparisons. The same keys inserted in order 3, 1, 4, 2, 5 give:

    3
/ \
1 4
\ \
2 5

Height is 2. Same keys, same BST property, different shape, different cost.

Example 2 -- deletion by successor replacement. Starting from the balanced tree above, delete 3. Since 3 has two children, find the in-order successor: 4 (the minimum of the right subtree). Copy 4's key into the node that held 3, then delete the successor node in its original position (a one-child or leaf case). Result:

    4
/ \
1 5
\
2

Height is still 2 and in-order traversal still reads 1, 2, 4, 5. The two-children delete reduces to a simpler delete by the successor-copy trick.

Common Confusion / Misconceptions

"The BST property is just about the immediate children." Wrong. It is pointwise on every node, which by induction implies every descendant on the left is <= and every descendant on the right is >=. A tree that passes only the parent-child check can still violate the BST property globally.

"Two equal keys cannot coexist." Depends on the variant. Multiset BSTs (e.g., std::multimap) allow duplicates using a consistent tiebreak (always go right for equal keys). Set BSTs (std::set) reject duplicates on insert. State your variant before debating correctness.

"In-order traversal is a special algorithm." It is just the three-line recursion in_order(left); visit(node); in_order(right) and it runs in O(n) time and O(h) extra stack space. Any discussion of Morris traversal, threaded BSTs, or iterator state is an optimization of the stack, not of the logical order.

"Deletion is symmetric to insertion." It is not. Insert always adds a leaf; delete has three cases (leaf, one child, two children) and only the two-children case needs the successor trick.

How To Use It

Before using a BST in code:

  1. ask whether insertion order is adversarial or even merely sorted
  2. if yes, use a balanced variant or a randomized replacement (see concepts 02-04)
  3. in-order traversal gives sorted output "for free" in O(n) time
  4. successor and predecessor are the cheap operations that ordered maps rely on
  5. deletion has three cases: leaf, one child, two children (replace with successor)
  6. if you need rank or order statistics, augment every node with subtree size and update on insert/delete
  7. for range scans, combine successor with an early-exit predicate -- do not re-search from the root

Transfer / Where This Shows Up Later

  • S5 (OS / filesystems): the directory trie in Unix filesystems and the symbol tables of a linker are BST-shaped. readdir is an in-order traversal. Process scheduling in the Linux CFS uses a red-black tree (a BST) keyed by virtual runtime.
  • S6 (databases): every ordered index in an OLTP database is a BST generalization -- B+trees for disk, skip lists for memtables (RocksDB, LevelDB), LSM trees whose sorted runs are materialized BSTs on disk.
  • S7 (architecture): "use an ordered index or a hash" is a recurring ADR. Knowing why a BST exposes successor but a hash does not lets you defend the choice.
  • S8 (system design at scale): range shards in distributed systems (Spanner, CockroachDB) use interval trees and BSTs to route keys to partitions. Range queries rely on in-order traversal of a sharded BST spanning the cluster.

Check Yourself

  1. What is the BST property, stated precisely?
  2. Which insertion orders produce the worst shape? Which produce the best?
  3. Why does in-order traversal produce sorted output?
  4. How does deletion of a node with two children reduce to deletion of a node with at most one child?
  5. What does successor(x) do when x has no right subtree, and what extra pointers make it O(h)?
  6. If you augment every node with size = 1 + size(left) + size(right), how do you answer select(k) (the k-th smallest key) in O(h)?

Mini Drill or Application

Using pencil and paper:

  1. Insert 5, 3, 8, 1, 4, 7, 9, 2, 6 into an empty BST. Draw it.
  2. Perform delete(5) using successor replacement. Draw the result.
  3. Insert the same keys in sorted order into a new BST. Compare heights.
  4. For each of the two trees above, write the in-order traversal and confirm it equals the sorted key list.
  5. Implement BST in Python or C++ with insert, search, delete, and in_order. Test on 10,000 random keys and 10,000 sorted keys; print the measured height and compare to log2(n) and n.

Read This Only If Stuck