Confidence Intervals: What 95% Confidence Actually Means
A 95% confidence interval does not mean a 95% chance the true value lies inside it. This subtle distinction matters enormously — and almost everyone gets it wrong.
Every poll you have ever read contains a confidence interval. "The candidate leads 52% to 48%, margin of error ±3%, 95% confidence." Almost everyone reading this interprets it the same way: there is a 95% probability the true support is between 49% and 55%. This interpretation is wrong.
The correct interpretation is more subtle, and understanding it changes how you evaluate almost every statistical claim you encounter.
The frequentist definition
A 95% confidence interval is constructed by a procedure that, if repeated across many independent samples, would contain the true parameter in 95% of cases. Once you have computed a specific interval, the true value either is or is not in it — there is no probability involved for a specific realized interval.
Constructing a confidence interval
CI = x̄ ± z_(α/2) × (σ / √n) Where: x̄ = sample mean z_(α/2) = 1.96 for 95% confidence (standard normal critical value) σ = population standard deviation n = sample size For unknown σ (typical), use t-distribution: CI = x̄ ± t_(α/2, n−1) × (s / √n) where t critical value depends on degrees of freedom (n−1)
What affects interval width
| Factor | Effect on CI width | Intuition |
|---|---|---|
| Larger sample size n | Narrower (∝ 1/√n) | More data = more precision |
| Higher confidence level (99% vs 95%) | Wider | More certainty requires broader net |
| Higher variability σ | Wider | Noisier data = more uncertainty |
| Larger effect size | No change | Effect magnitude doesn't change precision |
The common misinterpretations
The most frequent errors:
Wrong: "There is a 95% probability the true mean is in this interval." The true mean is fixed — it does not have a probability of being anywhere. What has probability is the procedure for constructing intervals.
Wrong: "95% of the data lies in this interval." A confidence interval is about the population mean, not individual observations. That would be a prediction interval (much wider).
Wrong: "If the intervals of two studies do not overlap, the effects are significantly different." Two CIs can overlap and still have significantly different means, and vice versa. Overlapping CIs are not a reliable significance test.
The Bayesian alternative
If you want to make probability statements about the parameter — "there is a 95% chance the true value is in this range" — you need a Bayesian credible interval. A 95% Bayesian credible interval genuinely means P(θ ∈ [a,b] | data) = 0.95. It requires specifying a prior distribution for the parameter.
For large datasets with weakly informative priors, Bayesian credible intervals and frequentist confidence intervals often numerically coincide — but their interpretations remain philosophically distinct. The difference matters most in small-sample or high-stakes settings where the choice of prior is consequential.