Skip to main content

Skip Lists as a Randomized Balanced-BST Replacement

What This Concept Is

A skip list is a layered linked-list structure that supports ordered dictionary operations in expected O(log n) time without using any rotations or invariants that have to be restored.

  • level 0 is a sorted linked list of all keys
  • level i+1 is a sorted sublist of level i, where each element is promoted independently with probability p (commonly 1/2)
  • search starts at the top level, walks right while the next key is smaller than the target, drops down a level when it overshoots, and continues

No node stores rank or balance information. The only decision per insert is "how many levels does this new node occupy?", chosen by repeated coin flips.

The geometric-distribution promotion produces, in expectation, n/2 nodes on level 1, n/4 on level 2, and so on. The expected number of levels is log_{1/p} n, and the expected search cost is log_{1/p} n / (1 - p). For p = 1/2 this is about 2 log_2 n, close to the best balanced-BST height bounds.

The crucial property is that the adversary selecting the input sequence cannot force bad shape. Once the keys are fixed, the internal coin flips decide the structure; the expected cost bound holds for every input. High-probability tail bounds (depth exceeds c log n with probability at most 1/n^{c-1}) make worst-case deviations negligible on large n.

Concurrent skip lists are the practical win that rotations cannot match. Because each level is an independent linked list, lock-free implementations (Java's ConcurrentSkipListMap, Doug Lea's design) can do fine-grained CAS updates per level instead of whole-tree rebalancing. This is one reason skip lists appear in production where balanced BSTs struggle.

Why It Matters Here

Skip lists are the cleanest example of a theme that recurs throughout this module: randomization can replace complicated invariant maintenance and still give the same expected bounds. They are used in production systems:

  • Redis sorted sets (ZADD, ZRANGEBYSCORE) are a skip list paired with a hash map
  • LevelDB / RocksDB memtables use skip lists for in-memory writes before flushing to SSTables
  • Apache Cassandra uses skip list-like structures for its in-memory tables
  • Java provides ConcurrentSkipListMap as the default lock-free sorted map

Their code is dramatically shorter than red-black or AVL and their concurrent variants avoid whole-tree locks.

Concrete Examples

Example 1 -- search trace. Five keys inserted with the shown promotion levels:

level 2:  HEAD ------------- 7 ------------ NIL
level 1: HEAD --- 3 ------- 7 ----- 12 --- NIL
level 0: HEAD -1- 3 -- 5 -- 7 -- 9- 12 -15 NIL

Search for 9:

  1. start at HEAD, level 2; next is 7, move right to 7
  2. next at level 2 is NIL, drop to level 1
  3. next at level 1 is 12 > 9, drop to level 0
  4. next at level 0 is 9, match

The search visited four nodes, which is close to log(6).

Example 2 -- insert with coin flips. Starting from an empty skip list with p = 1/2, insert keys 4, 1, 7, 3 using the coin-flip sequence HH, T, HHT, T where an H promotes to the next level and a T stops:

  • insert 4: flips HH -> level 2 -> node at levels 0, 1, 2
  • insert 1: flip T -> level 0 only
  • insert 7: flips HHT -> level 2 -> node at levels 0, 1, 2
  • insert 3: flip T -> level 0 only
level 2:  HEAD --- 4 --- 7 --- NIL
level 1: HEAD --- 4 --- 7 --- NIL
level 0: HEAD -1-3-4-----7--- NIL

Each insert does one search (to find predecessors at every level) and one splice (one write per promoted level). Cost is O(log n) expected because the search is O(log n) and the splice is O(height of new node), which is O(log n) expected.

Common Confusion / Misconceptions

"Skip list is a multi-level cache or B-tree." It is not. Every key lives at level 0; the higher levels are "express lanes" that let search skip past long stretches of the bottom list. Unlike a B-tree, each level holds the same keys at different densities, not disjoint partitions.

"Expected O(log n) means guaranteed O(log n)." With probability 1/n^c, a skip list can be deeper than c * log n. The bound is on the expectation, not a deterministic ceiling. For hard real-time systems this matters; for almost everything else it does not.

"p = 1/2 is optimal." The expected search cost is minimized near p = 1/e ≈ 0.37, but the difference versus p = 1/2 is under 10%. Implementers pick p = 1/2 because a single random bit decides the coin flip, which is cheaper than a real-valued comparison.

"You need a cryptographic RNG." You do not, unless the input is adversarial and can observe your randomness (e.g., in a hash-flooding attack on a networked service). For in-process use, xorshift or a well-seeded PRNG is fine.

How To Use It

Operational rules:

  1. Search: walk right at the highest level until the next key exceeds the target, drop down, repeat
  2. Insert: search-and-record update pointers at each level, generate the new node's level by repeated coin flips, splice it in at all levels up to its chosen height
  3. Delete: search-and-record update pointers, then unlink the node from every level it appears on

Expected search cost: O(log n) for p = 1/2. Expected space: O(n) (geometric distribution gives ~2n total pointers).

Implementation knobs to fix upfront:

  • cap the maximum level at log_{1/p}(n_max) to avoid pathological spikes
  • prefer a doubly linked level-0 list to support predecessor in O(1) after a search
  • for concurrent use, use per-level CAS with version counters (Java's design)

Tradeoff versus red-black:

  • simpler code: no rotations, no recoloring, no symmetric cases
  • tail behavior is probabilistic, not deterministic
  • concurrency is easier because each level has its own linked list

Transfer / Where This Shows Up Later

  • S5 (OS): lock-free ordered maps in kernel-adjacent libraries and some filesystem journals use skip-list-style stacking; the "layered list" idea also underlies write-ahead logs with level-based compaction.
  • S6 (databases): RocksDB and LevelDB memtables are skip lists; MongoDB's WiredTiger in-memory indices optionally use skip lists; Redis's sorted-set implementation is the canonical example.
  • S7 (architecture): "write-heavy in-memory index" ADRs frequently land on skip lists for exactly the concurrency reason given above -- it is easier to explain correctness under contention than for a lock-free red-black tree.
  • S8 (scale): distributed priority queues and time-sharded leaderboards layer skip lists over sharded storage (Redis Cluster's ZADD across shards), achieving expected-O(log n) semantics per shard with concurrent writers.

Check Yourself

  1. What determines the height of a newly inserted node?
  2. Why is the expected search cost O(log n) for p = 1/2?
  3. What does the skip list trade away compared to a red-black tree?
  4. Why does Redis use a skip list for sorted sets instead of a balanced BST?
  5. Give the high-probability bound for skip-list depth and explain where the exponent comes from.
  6. Why are concurrent skip lists easier to make lock-free than concurrent red-black trees?

Mini Drill or Application

  1. Build a skip list for the keys 2, 4, 6, 8, 10, 12, 14, 16, using explicit coin flips (or a provided sequence such as HHTTTHTH) to assign levels. Draw the result.
  2. Trace search(10) on your skip list, listing each node visited.
  3. Insert 13 using coin flips HT and redraw.
  4. Compute the total number of pointers in your structure and compare to 2n.
  5. Implement a skip-list-based ordered map in Python or Rust (~80 lines). Benchmark insert and lookup on 1M random integers and compare to SortedDict / BTreeMap.

Read This Only If Stuck