Standard Deviation: The Most Misunderstood Number in Statistics
Standard deviation is not just a formula — it is a measure of how wrong your average is. Understanding it changes how you read every statistic you encounter.
The mean is the most natural summary of a dataset. But the mean alone is almost always misleading. Two datasets can have identical means and wildly different behavior. This is where standard deviation earns its place as the second most important number in any statistical analysis.
What variance measures — and why we take the square root
Variance is the average squared distance from the mean. We square the distances for two reasons: to make all deviations positive, and to penalize large deviations more heavily than small ones. A deviation of 10 contributes 100 to the variance; a deviation of 1 contributes only 1. Squaring amplifies outliers.
Population variance: σ² = Σ(xᵢ − μ)² / N Sample variance: s² = Σ(xᵢ − x̄)² / (n − 1) Standard deviation: σ = √σ² Why n−1 for samples? Bessel's correction: the sample mean is itself estimated from the data, creating one degree of constraint. Using n would systematically underestimate the population variance.
Taking the square root returns variance to the original unit of measurement. If you measured heights in centimetres, variance is in cm² — a meaningless unit. Standard deviation is in centimetres, directly interpretable.
The 68-95-99.7 rule
For any normally distributed variable, the standard deviation determines the probability of observations falling within certain ranges:
| Range | % of observations | Interpretation |
|---|---|---|
| μ ± 1σ | 68.27% | About two-thirds of all data |
| μ ± 2σ | 95.45% | The "standard" confidence band |
| μ ± 3σ | 99.73% | Events outside this are rare (≈1 in 370) |
| μ ± 6σ | 99.9999998% | The target of Six Sigma manufacturing |
When standard deviation misleads you
Standard deviation assumes the data is roughly symmetric and has no extreme outliers. When these assumptions break down, σ becomes a poor summary. A distribution with fat tails — financial returns, earthquake magnitudes, internet traffic — has standard deviation that dramatically understates real risk.
The lesson is not that standard deviation is wrong — it is that it was applied to a distribution where its assumptions did not hold. Knowing when σ is the right tool requires understanding both what it measures and what it ignores.
Coefficient of variation: comparing across scales
CV = (σ / μ) × 100% Example: Two manufacturing processes, both targeting a diameter of 10mm. Process A: σ = 0.1mm → CV = 1% Process B: σ = 0.5mm → CV = 5% Process B has five times more relative variability. The CV is scale-free — useful for comparing very different measurements.
Our Roulette Simulator lets you observe standard deviation live. Track the frequency of a single number over many spins. You will see the count oscillate around the expected value of spins/37 (European) with variance shrinking — relative to the mean — as the spin count grows. The standard deviation of your observed frequency is doing exactly what the formula predicts.