Randomized Data Structures: Skip Lists, Treaps, Randomized BSTs
What This Concept Is
A randomized data structure makes structural decisions using random choices rather than deterministic invariants. The guarantee shifts from "every sequence runs in O(log n)" to "expected per-operation cost is O(log n), regardless of the input sequence."
Three representative examples:
- Skip lists (Pugh 1990): level of each new node is chosen by coin flips; expected search cost is
O(log n)with high probability - Treaps (Seidel-Aragon 1996): every key also has a random priority; the structure is a BST on keys and a max-heap on priorities, which uniquely determines a shape equivalent (in expectation) to a random BST
- Randomized BSTs (Martinez-Roura 1998): insertion randomly decides whether the new node becomes the root via a randomized rotation, giving expected balanced shape without tracking priorities
In all three, the input can be chosen adversarially, but the structure's internal random choices make bad shapes unlikely. The central theorem is that for any fixed input sequence, the expected depth of any node is O(log n), and the probability that the depth exceeds c log n decays polynomially in n.
Treaps deserve a closer look because they admit two powerful operations that deterministic balanced BSTs do not expose directly:
- split(T, k): partition treap
Tinto two treaps, one containing keys<= kand one containing> k, in expectedO(log n) - merge(T1, T2): combine two treaps where every key in
T1is less than every key inT2, in expectedO(log n)
split and merge make treaps the natural data structure for competitive-programming problems requiring range reversal, range insertion, or segment gluing (implicit treaps / cartesian trees).
Why It Matters Here
Randomization is the third way to maintain balance. The first two are:
- deterministic invariant + complex rebalancing (red-black, AVL)
- worst-case analysis on input distribution (not a data-structure property, and fragile)
Randomized structures often have dramatically simpler code than their deterministic cousins. They trade worst-case guarantees for expected guarantees, which is acceptable in almost all practical contexts (databases, caches, in-memory indexes). Concurrent implementations of randomized structures are also simpler because local random decisions do not propagate.
Production systems that use randomized balanced structures include Redis sorted sets (skip list), LevelDB / RocksDB memtables (skip list), Java's ConcurrentSkipListMap, and competitive-programming templates worldwide (treap / implicit treap).
Concrete Examples
Example 1 -- treap insertion. Insert key 5 with random priority 0.72 into an empty treap. The tree is just 5 @ 0.72.
Insert key 3 with random priority 0.41:
- BST-insert:
3goes left of5 - heap property: parent priority
0.72 >= 0.41holds - no rotation needed
5 @ 0.72
/
3 @ 0.41
Insert key 7 with random priority 0.91:
- BST-insert:
7goes right of5 - heap property: parent priority
0.72 < 0.91, violated - rotate so that
7becomes the root
7 @ 0.91
/
5 @ 0.72
/
3 @ 0.41
The shape depends on the random priorities, which the adversarial input cannot control. Over random priorities, the expected depth of any node is O(log n).
Example 2 -- treap split. Starting from the treap above (keys {3, 5, 7}), call split(T, 5): produce two treaps, one containing keys <= 5 (i.e., {3, 5}) and one containing keys > 5 (i.e., {7}).
Recursive algorithm:
split(T, k):
if T is empty: return (empty, empty)
if T.key <= k:
L, R = split(T.right, k)
return (new_treap(T.left, T.key, L), R)
else:
L, R = split(T.left, k)
return (L, new_treap(R, T.key, T.right))
On our tree: root is 7 > 5, so recurse into left subtree (5 @ 0.72). Root is 5 <= 5, so recurse into right (empty). Result left from this call is (5 @ 0.72) with children {3, empty}; result right is empty. Bubbling back up, the final left treap is {3, 5} and the final right treap is {7}. Expected work: O(log n) recursive calls.
This is how "insert range, delete range, reverse range" operations are implemented on sequences: wrap them as implicit treaps (priorities random, keys = positions) and compose with split + merge.
Common Confusion / Misconceptions
"Randomized means randomized input." No. The assumption is that the data structure's internal randomness is truly random (or cryptographically unpredictable). The input can be arbitrary.
"O(log n) expected is a worst-case bound." It is not. For skip lists and treaps, the bound is on the expected cost per operation, not a worst case. The variance is low (the bound holds with high probability, e.g., at most c log n depth with probability 1 - 1/n^c), but no finite bound holds deterministically.
"Randomized BSTs and treaps are the same thing." They are distinct. Treaps store explicit priorities; randomized BSTs (Martinez-Roura) make randomized decisions during insert/delete without storing priorities. Both give the same expected bounds, but treaps expose split/merge cleanly while randomized BSTs do not.
"A predictable PRNG is safe for randomized structures." Only if the adversary cannot observe your randomness. In networked services, a predictable RNG lets an attacker construct an input that forces worst-case shape (hash flooding analog). Use a cryptographically secure RNG for any structure exposed to external input.
How To Use It
When to reach for a randomized balanced structure:
- implementation simplicity matters more than worst-case bounds (most production code)
- concurrent access is required and lock-free design is easier with local randomness
- your standard library already provides a randomized variant (Redis sorted set, some Rust / Go btree alternatives, Java's
ConcurrentSkipListMap) - you need
split/merge(treap is the natural choice)
When not to:
- hard real-time systems with strict worst-case SLAs
- contexts where a side channel on internal randomness could leak information (security-sensitive data structures)
- extremely small
nwhere constant factors dominate (use a deterministic array or balanced BST)
Comparison table:
| Structure | Balancing | Worst-case | Expected |
|---|---|---|---|
| Red-Black tree | coloring invariant | O(log n) | O(log n) |
| AVL tree | balance-factor invariant | O(log n) | O(log n) |
| Skip list | random level | O(n) | O(log n) |
| Treap | random priority | O(n) | O(log n) |
| Randomized BST | random rotation | O(n) | O(log n) |
Transfer / Where This Shows Up Later
- S5 (OS): the Linux kernel uses randomization in slab allocators and page-coloring schemes; skip-list patterns appear in some filesystem journaling layers.
- S6 (databases): skip lists power RocksDB / LevelDB memtables; treap-style split/merge is the backbone of segment-tree-of-treaps implementations used in database-internal range indexes.
- S7 (architecture): when an ADR says "use a randomized balanced structure" it is asserting that the concurrency and simplicity wins outweigh the worst-case guarantee. Calibrate to workload and adversary model.
- S8 (scale): distributed random sampling, consistent-hash rings with random virtual-node placement, and probabilistic load balancers all rely on the same expected-bound arguments.
Check Yourself
- What do skip lists, treaps, and randomized BSTs have in common structurally?
- What distinguishes "expected
O(log n)" from "worst-caseO(log n)"? - Why is the adversarial-input case not a concern for randomized structures?
- Name one production system that uses a randomized balanced structure.
- What are
splitandmergeon a treap, and why are they expectedO(log n)? - Why can a predictable PRNG break the expected-bound guarantee when the structure is exposed to external input?
Mini Drill or Application
- Insert
1, 2, 3, 4, 5into a treap using priorities0.3, 0.8, 0.1, 0.6, 0.9. Draw the tree after each insert, including any rotations. Report the final height. - Repeat with priorities
0.9, 0.8, 0.7, 0.6, 0.5. Compare the shape. - Explain in one paragraph why an adversarial insertion order cannot force a bad treap shape.
- Sketch pseudocode for a random-priority insert into a treap and a deletion that rotates the node to a leaf before unlinking.
- Implement a treap with
splitandmergein Python or C++ (~150 lines). Use it to solve "range reverse on an array ofn = 10^5" inO(log n)per reverse.
Read This Only If Stuck
- Skiena: 3.4 Binary search trees
- Skiena: 3.4.3 Balanced search trees
- Sedgewick: Balanced trees (Part 2) -- contrast with deterministic
- Sedgewick: Random numbers -- PRNG quality for randomized structures
- Competitive Programming: 2.3 Non-linear DS with built-in libraries (Part 1)
- Competitive Programming: 2.3 Non-linear DS with built-in libraries (Part 2)
- CP-Algorithms: Treap (Cartesian tree)
- Pugh, W. (1990). Skip Lists: A Probabilistic Alternative to Balanced Trees
- MIT 6.851 -- Randomized data structures lecture