Continuous Random Variables Use Densities, Not Point Masses
What This Concept Is
For a continuous random variable, probability is distributed across intervals rather than concentrated at isolated points. The object that describes this distribution is the probability density function ( f(x) ), and probabilities come from area under the curve: [ P(a \le X \le b) = \int_a^b f(x), dx. ] In stark contrast to the discrete case, ( P(X = c) = 0 ) for any single point ( c ). Probability "lives" on intervals, not on points.
A valid density satisfies two requirements:
- ( f(x) \ge 0 ) everywhere, and
- ( \int_{-\infty}^{\infty} f(x), dx = 1 ).
Crucially, ( f(x) ) is not a probability. It is a probability per unit length. Its units are reciprocal to the units of ( X ). A density value can exceed 1 -- for example, the uniform distribution on ([0, 0.1]) has ( f(x) = 10 ) on that interval. What has to stay bounded is the integral, not the height.
The cumulative distribution function (CDF) ( F(x) = P(X \le x) ) unifies discrete and continuous: it always exists, is always monotone non-decreasing, and goes from 0 to 1. In the continuous case, ( F ) is differentiable almost everywhere and ( F'(x) = f(x) ). Flipping this around, ( f ) is the rate at which probability accumulates along the axis.
Intuitively: a PMF is a bar graph of probability mass. A PDF is a smooth curve, and only the area under segments of the curve has probabilistic meaning. If you could "zoom in" on a continuous variable to an exact point, you would find nothing -- all the probability has slipped through to adjacent infinitesimal intervals.
Why It Matters Here
This is the conceptual shift that prevents major mistakes later. Four specific errors:
- Reading density height as probability. A density value of 2.0 does not mean "twice as likely as density 1.0" -- it means "twice the concentration of probability per unit length." Comparing heights is valid only for relative likelihood within one distribution.
- Thinking interval endpoints matter. ( P(a \le X \le b) = P(a < X < b) ) for continuous ( X ); inclusion of endpoints changes nothing because each endpoint has measure zero.
- Mixing discrete and continuous intuition. You cannot ask "what is the probability that latency is exactly 47.3 ms" -- that probability is zero. You can only ask about ranges: ( P(47.0 \le X \le 48.0) ), or equivalently ( F(48.0) - F(47.0) ).
- Thinking a density must be bounded by 1. No -- only the integral is bounded.
Continuous models are essential for timing, measurement noise, load, latency, and Normal-approximation arguments. Whenever a quantity is "real-valued on a smooth scale" -- request latency, CPU utilization percentage, disk temperature, network throughput -- the distribution is continuous and you reason via density and area, not via point probabilities.
Concrete Examples
Example 1 -- Uniform on ([0, 10]). The density is ( f(x) = 1/10 ) for ( 0 \le x \le 10 ) and 0 elsewhere. Then: [ P(2 \le X \le 5) = \int_2^5 \tfrac{1}{10}, dx = \tfrac{3}{10}. ] But ( P(X = 2) = \int_2^2 \tfrac{1}{10}, dx = 0 ), even though ( x = 2 ) is in the support. A point has no width, so it contributes no area. Note the density ( f(x) = 1/10 ) is less than 1, but that is coincidence; if the uniform were on ([0, 0.1]), we would have ( f(x) = 10 ), well above 1, yet still valid.
Example 2 -- Service latency as a triangular density. Suppose service latency (ms) has density [ f(x) = \begin{cases} x/5000 & 0 \le x \le 100, \ (200 - x)/5000 & 100 \le x \le 200, \ 0 & \text{otherwise}. \end{cases} ] Verify it integrates to 1: the triangle has base 200 and peak ( 100/5000 = 1/50 ), area ( \frac{1}{2} \cdot 200 \cdot \frac{1}{50} = 2 ). That's wrong -- let's redo: peak density is ( 100/5000 = 1/50 = 0.02 ), area is ( \frac{1}{2} \cdot 200 \cdot 0.02 = 2 ). So the correct normalization uses denominator 10000: use ( x/10000 ) rising and ( (200-x)/10000 ) falling. Then ( P(X \le 100) = \int_0^{100} x/10000, dx = \frac{x^2}{20000}\Big|_0^{100} = 0.5 ), as expected by symmetry. The 99th percentile is ( x^* ) with ( F(x^) = 0.99 ); the top 1% has area 0.01, from ( 200 - d ) to ( 200 ) where ( \frac{1}{2} d^2 / 10000 = 0.01 ), so ( d^2 = 200 ), ( d \approx 14.14 ), and ( x^ \approx 185.86 ). This is the kind of computation that drives p99 SLO reasoning in later semesters.
Common Confusion / Misconceptions
"The density ( f(x) ) is the probability at ( x )." No -- ( f(x) ) is probability per unit length; only areas under ( f ) are probabilities.
"A density cannot exceed 1." A density can be arbitrarily large at a point; what must be finite is the integral over all space.
"Endpoint inclusion matters." For continuous variables, ( P(a \le X \le b) = P(a < X < b) = P(a < X \le b) = P(a \le X < b) ). Endpoints are measure-zero events.
"Probability of exactly ( c ) is nonzero but very small." For a truly continuous model, it is exactly zero, not small. If you need a positive probability at a point, you are implicitly assuming a discrete or mixed distribution.
"All real-world quantities are continuous." Real-world quantities like latency are quantized by clocks and floating-point representation. The continuous model is an approximation that is almost always excellent but is not literally true.
How To Use It
For continuous variables:
- Identify the support interval -- where is ( f(x) > 0 )?
- Write the density on that support and verify it is non-negative and integrates to 1.
- Integrate over the interval of interest to compute a probability.
- Interpret probability through area, not point values. Think "probability in that band," not "probability at that spot."
- Use the CDF as a universal tool. If integration is messy, express the probability as ( F(b) - F(a) ) and look up or numerically compute ( F ).
- Convert between discrete and continuous carefully. If modeling latency with bin width 1 ms, point probabilities are order ( 10^{-3} ); the density value is ( \sim 10^{-3} ) per ms times 1000 per second = ( \sim 1 ) per second. Units matter.
- When comparing endpoints conventions like ( (a, b] ) vs ( [a, b] ), remember that point endpoints do not matter for continuous variables. For the CDF convention, always use ( F(b) - F(a) ) with ( F(x) = P(X \le x) ).
Transfer / Where This Shows Up Later
- Semester 2 (Randomized algorithms). Continuous random variables model random real numbers returned by uniform samplers; in particular, the analysis of randomized geometry, Monte Carlo integration, and random-priority data structures (treaps) rests on continuous probability.
- Semester 5 (Queueing). Interarrival times and service times are continuous -- usually Exponential -- so queueing theory is inherently a continuous-probability topic (see Cluster 5, concept 14). Little's Law connects time averages of continuous random variables to mean queue length.
- Semester 6 (Tail latency). Latency distributions are continuous (up to quantization); p99, p99.9, p99.99 percentiles are CDF inverse values. Understanding density shape -- in particular, heavy right tails -- is the core of distributed-systems tail at scale.
- Semester 8 (SLOs and percentiles). SLO definitions like "p99 < 300 ms" are conditions on the CDF of the latency random variable, not on any single request. Expressing them rigorously requires the continuous-density mindset.
- Semester 9 (Experimentation and monitoring). Continuous test statistics (t-statistics, CUSUM statistics) are continuous random variables; confidence intervals and hypothesis-test regions are integrals under their densities.
Check Yourself
- Why is ( P(X = c) = 0 ) for a continuous variable?
- Why can a density value exceed 1? Give an example.
- What geometric object corresponds to probability in the continuous case?
- If ( X ) is continuous, is ( P(a \le X \le b) ) the same as ( P(a < X < b) )? Why?
- A density ( f(x) = kx ) on ([0, 2]) and 0 elsewhere. What must ( k ) equal? Then compute ( P(X \le 1) ). (Answers: ( k = 1/2 ); ( P(X \le 1) = 1/4 ).)
- If ( X ) has CDF ( F(x) = 1 - e^{-x} ) for ( x \ge 0 ), what is the density? What is ( P(1 \le X \le 2) )?
Mini Drill or Application
Suppose ( X ) is uniform on ([0, 4]).
- Write the density.
- Compute ( P(1 \le X \le 3) ).
- Compute ( P(X = 2) ).
- Explain why ( P(1 < X < 3) ) is the same as ( P(1 \le X \le 3) ).
- Write the CDF of ( X ). At what ( x ) does ( F(x) = 0.75 )?
Simulation drill -- density histograms vs. PDFs. Use numpy to draw 100,000 samples from a standard Normal. Plot a histogram with 50 bins and overlay the Normal PDF. Verify visually that the histogram scaled to unit area matches the density -- this is how the abstract "density" becomes concrete. Then compute sample fraction in ([-1, 1]); it should be near ( \Phi(1) - \Phi(-1) \approx 0.683 ). Now double the bin count; observe that bin heights shrink but the area stays the same. This is exactly the point of "density, not mass."
Read This Only If Stuck
- Introduction to Probability: Probability density functions (Part 1)
- Introduction to Probability: Probability density functions (Part 2)
- Introduction to Probability: Uniform
- Introduction to Probability: Universality of the Uniform (Part 1)
- Introduction to Probability: Universality of the Uniform (Part 2)
- Introduction to Probability: Summaries of a distribution (Part 1)
- Wikipedia: Probability density function -- clarifies the non-probability nature of density values and the measure-theoretic definition.
- Wikipedia: Cumulative distribution function -- the universal object that unifies discrete and continuous cases.