Radix Sort and Linear in Key Width
What This Concept Is
Radix sort sorts fixed-width keys one digit at a time using a stable sort (usually counting sort) as the inner pass.
Least-significant-digit (LSD) radix sort:
def radix_sort_lsd(A, d, base=10):
for digit in range(d):
A = counting_sort_by_digit(A, digit, base)
return A
If each key has (d) digits in base (b), and counting sort costs (\Theta(n + b)), the total is (\Theta(d,(n + b))). When (b = \Theta(n)) and (d) is a constant -- for example, 32-bit integers sorted as four 8-bit digits -- this is (\Theta(n)).
Radix sort requires a stable inner sort. If you sort from least to most significant digit, stability of the previous pass is what makes the most recent digit dominate correctly. There are two directions:
- LSD radix (least-significant-digit first) -- straightforward loop, processes fixed-width keys, needs stable inner sort. The right choice for integers and fixed-length strings.
- MSD radix (most-significant-digit first) -- recursive; partitions by top digit then recurses. Handles variable-length strings, can short-circuit on early mismatches, and is the basis for three-way radix quicksort (Sedgewick) and the burstsort family.
Distinguish radix from its neighbors: counting sort assumes the entire key is a small integer; trie-based sorts use the same digit decomposition but build a tree instead of reshuffling arrays; bucket sort assumes a distribution on keys rather than fixed width. Radix sort is the one to reach for on short fixed-length byte strings.
Why It Matters Here
Radix sort is how real systems sort fixed-width things fast: integers, IP addresses, short strings, floating-point numbers reinterpreted as integers (with one sign-bit flip trick). It is also how you see that "linear time" always has a hidden parameter -- the width of a key.
Reading (\Theta(d n)) as "linear in (n)" is only honest if (d) is treated as a constant. If keys are logarithmic in (n) (which they must be to be distinct), (d = \Theta(\log n / \log b)), and the total work is (\Theta(n \log n / \log b)) -- still competitive with comparison sort, and often faster because each step is cache-friendly arithmetic rather than branch-heavy comparison.
Radix sort also connects directly to later systems work: byte-at-a-time radix sort is the basis for many database ORDER BY implementations on fixed-width columns, and MSD radix is how production key-value stores sort prefixes in SSTables.
Concrete Examples
Example 1 -- LSD base-10 radix on 3-digit integers. Sort [170, 045, 075, 090, 802, 024, 002, 066] digit by digit.
Pass on ones digit (stable counting sort): [170, 090, 802, 002, 024, 045, 075, 066]
Pass on tens digit: [802, 002, 024, 045, 066, 170, 075, 090]
Pass on hundreds digit: [002, 024, 045, 066, 075, 090, 170, 802]
After each pass the array is sorted by the digits seen so far, and stability of the inner sort preserves earlier orderings among equal current-digit keys.
Example 2 -- 32-bit integers via four 8-bit passes. Given (n = 10^8) random 32-bit integers, the choice is (d = 4), (b = 256). Total work (\sim 4 \cdot (n + 256) \approx 4 \times 10^8) operations, each a cache-friendly array increment or indexed write. A well-tuned comparison sort (e.g., std::sort) on the same input does (\sim n \log_2 n \approx 2.7 \times 10^9) comparisons with branch mispredictions and pointer chasing. Measured on modern CPUs, LSD radix beats comparison sort by 2-4x in wall clock; on GPUs the ratio is closer to 10x because the histogram + prefix-scan pattern maps perfectly to SIMD lanes.
To sort signed integers by radix, flip the sign bit (bitwise XOR with 0x80000000) before sorting and flip it back afterward -- this converts the two's-complement total order into an unsigned lex order over bytes. The same trick, plus XOR of the exponent, sorts IEEE-754 floats.
Complexity derivation. (d) digit passes (\times) one counting-sort pass each: (T(n) = d \cdot (n + b)). With (b = 2^w) (so each pass is a (w)-bit digit) and (d = \lceil k / w \rceil) for (k)-bit keys, (T(n) = \lceil k/w \rceil (n + 2^w)). Minimizing over (w) for a given (n) and (k): set (\partial / \partial w = 0) to find (w \approx \log_2 n), giving (T = \Theta(k n / \log n)). For fixed (k) (32-bit keys), this is (\Theta(n / \log n)) digit-work per element, which is why well-tuned radix sorts cross over below (n \log n) comparison sorts in practice.
Common Confusion / Misconceptions
"LSD and MSD are interchangeable." LSD radix processes least-significant digits first and requires a stable inner sort. MSD radix processes most-significant first, recurses on partitions, and is useful for variable-length strings where early mismatches stop the comparison. Mixing them up either breaks correctness (LSD without stability) or gives up the whole advantage (MSD without early termination).
"Radix is linear." Radix is linear only when (d) is bounded. For unbounded-precision integers where (d = \Theta(\log n)), radix sort becomes (\Theta(n \log n)) -- the same asymptotics as comparison sort, though usually with better constants.
"Radix works on any comparable type." It only works when a key has a digit or byte decomposition with a well-defined comparison order on those digits -- either naturally (integers, ASCII strings) or after encoding (locale-aware string collation, IEEE-754 float canonicalization). For arbitrary objects with a custom comparator, radix does not apply.
"Base choice is cosmetic." Base (b) is a major performance knob. Too small, and (d) grows; too large, and (b) dominates the (\Theta(n + b)) per pass and the count array blows out cache. Production code typically uses (b = 256) (one byte per pass) to match word addressing, SIMD shuffle instructions, and L1 cache line boundaries.
How To Use It
Use radix sort when:
- keys are fixed-width integers, short strings, or fixed-length byte sequences
- (n) is large enough that the linear behavior pays for the constant factors
- you need deterministic worst case and stability
- comparisons are expensive relative to integer arithmetic (e.g., sorting 128-bit integers, UUIDs)
- the workload is cache-sensitive and comparison sort's branch mispredictions hurt
Do not use radix sort for:
- general comparable objects without digit decomposition
- small (n), where constants dominate (cutover is typically (n \geq 100) for in-memory radix)
- wildly variable-length keys unless you switch to MSD radix with early termination
- tiny key ranges where a single counting-sort pass is already (\Theta(n))
Tuning playbook:
- Choose the base (b) as a power of 2 matching a native word size (256 for byte-level passes is the standard).
- Tune so (d \times \text{passes}) is small and each pass fits in L1/L2 cache.
- For GPU and wide-SIMD implementations, use the standard "histogram / exclusive-scan / scatter" triple.
- On mixed workloads (some passes over random data, some over nearly-sorted), consider MSD-first with a quicksort cutover inside buckets -- the pattern used by Sedgewick's 3-way radix quicksort.
Transfer / Where This Shows Up Later
- S2 M2 (this module). LSD radix with counting sort as its inner pass is the direct ancestor of every high-throughput sort you will see in concept 07 (external and practical sorting).
- S2 M5 (advanced structures). Suffix arrays are built with DC3/SA-IS algorithms that repeatedly radix-sort triples of characters; tries and compressed radix trees use the same digit-decomposition idea.
- S3 (software design). "Process keys digit-by-digit" is the ordering pattern you use whenever you implement multi-key sorting as repeated stable single-key passes.
- S4 (systems programming). GPU/SIMD sorting libraries (CUB, Thrust, Arrow) ship radix sort as their primary kernel because the data-parallel histogram/prefix-scan pattern maps to hardware.
- S5 (OS). File systems that sort inodes by hashed prefixes for directory lookup use byte-at-a-time radix order to exploit cache lines.
- S6 (databases). ORDER BY on fixed-width numeric or fixed-length string columns frequently dispatches to a radix sort; LSM-trees store keys in radix-sorted order within each SSTable; partition-pruning on range-partitioned tables uses the radix prefix.
- S7 / S8 (architecture, system design). Distributed shuffle phases (MapReduce, Spark) use radix-style partitioning on hash prefixes to bucket records to reducers. Range-partitioning by a fixed prefix is the architectural generalization of MSD radix.
Check Yourself
- Why does radix sort require a stable inner sort?
- When is (\Theta(d,(n + b))) actually linear in (n)?
- Why does MSD radix work for variable-length strings but LSD radix does not directly?
- What is the trick for radix-sorting signed integers? IEEE-754 floats?
- Why is base (b = 256) such a popular choice in production?
- Derive the optimal (b) for given (n) and key width (k) by minimizing (\lceil k/w \rceil (n + 2^w)). Why does the answer depend on (n) as well as (k)?
- Why are parallel LSD radix implementations memory-bound rather than compute-bound on modern hardware?
Mini Drill or Application
- Run LSD radix on
[329, 457, 657, 839, 436, 720, 355]base 10 and write the array after each digit pass. - Argue why sorting 10 million 32-bit integers by four 8-bit radix passes beats quicksort in practice, using concrete instruction counts.
- Describe how you would radix-sort 20-character ASCII strings of variable length using MSD.
- Implement a byte-at-a-time LSD radix sort and benchmark it against
std::sorton random 64-bit integers for (n = 10^7). - Extend your implementation to sort IEEE-754
float64values by the exponent-XOR trick; verify on NaN-free input. - Production scenario. A telemetry service ingests 5 billion 128-bit UUIDs per day and must emit them sorted for ID-range-based shard assignment. Compare the wall-clock cost of LSD radix (16 byte-passes) versus
std::sorton a 32-core machine; account for memory bandwidth (you cannot avoid scanning 80 GB sixteen times) and explain why a two-level (MSD-then-LSD) variant usually wins for this scale.
Read This Only If Stuck
- CLRS: 8.3 Radix Sort
- CLRS: 8.3 Radix Sort (continued)
- CLRS: 8.2 Counting Sort -- the inner pass in full detail.
- Sedgewick: Radix Sorting
- Sedgewick: Radix Sorting (MSD + 3-way)
- Sedgewick: Radix Searching -- the radix idea applied to searching.
- Competitive Programming: 9.32 Sorting in Linear Time
- Princeton Sedgewick: 5.1 String Sorts -- LSD, MSD, and 3-way radix quicksort with benchmarks.
- MIT 6.006 Lecture 5: Linear-Time Sorting -- counting and radix linkage.
- VisuAlgo: Sorting (radix LSD trace) -- toggle base and watch the four-pass byte radix on 32-bit integers.
- ska_sort blog series (Malte Skarupke) -- walkthrough of a branch-free in-place LSD radix that beats
std::sorton integers. - CUB radix sort (NVIDIA) -- the GPU radix that ships the histogram + exclusive-scan pattern in production.
- Kärkkäinen & Sanders: Simple linear-work suffix array (DC3) -- suffix-array construction via three rounds of radix sort on triples.
- Knuth TAOCP Vol 3 §5.2.5: Sorting by Distribution -- the original exposition of LSD, MSD, and their variants.
- RadulescuSkarupke talk: Sorting Faster Than C++ (2017) -- conference talk walking through ska_sort and branchless byte-radix tricks.
- MSD radix sort with 3-way quicksort inside buckets (Sedgewick) -- reference Java code pairing digit decomposition with inside-bucket quicksort.
- Boost.Sort spreadsort -- production hybrid of MSD radix and introsort with documented crossover rules.
- Learning to sort with SIMD (vectorized quicksort vs. radix) -- AMD research -- measurements from the radix vs. SIMD quicksort debate on modern GPUs.
- Polychroniou & Ross: A comprehensive study of main-memory partitioning and its application to large-scale comparison- and radix-sort -- in-depth engineering study of radix on modern multicore.
- Apache Arrow radix kernel -- columnar radix kernels that ship inside Arrow's
sort_indicesfor large datasets.