Applied Probability7 min read

What Is a P-Value? A Plain English Explanation

P-values appear in scientific papers, medical trials, and news headlines. Most people misread them. Here is what they actually mean.

The Probability Lab Team

June 13, 2026

Few numbers in science are more misunderstood than the p-value. It appears in medical studies, psychology papers, and news headlines. It is cited as evidence of discovery. It is also, frequently, misinterpreted in ways that lead to false conclusions.

What a p-value actually measures

A p-value is the probability of observing results at least as extreme as your data, assuming the null hypothesis is true. That definition is precise but dense. Let us unpack it with a concrete example.

Suppose you flip a coin 100 times and get 60 heads. You want to know: is this coin fair? The null hypothesis is that the coin is fair (p = 0.5). The p-value answers: if the coin were fair, how often would we see 60 or more heads in 100 flips just by chance?

P-value for 60 heads in 100 flips

Null hypothesis: coin is fair (p = 0.50)
Observed: 60 heads out of 100 flips
P-value ≈ 0.028

Interpretation: if the coin were fair, results this
extreme would occur about 2.8% of the time by chance.

What a p-value does NOT mean

The most common mistake is reading a p-value of 0.028 as "there is a 2.8% chance the coin is fair." That is wrong. The p-value is not the probability that the null hypothesis is true. It is the probability of the data given the null hypothesis, which is a completely different quantity.

A small p-value means your data would be unlikely if the null hypothesis were true. It does not tell you how likely the null hypothesis is.

The 0.05 threshold

By convention, results with p < 0.05 are called "statistically significant." This threshold was proposed by statistician Ronald Fisher in 1925 as a rule of thumb, not a law of nature. A p-value of 0.049 and a p-value of 0.051 are essentially identical in evidential strength, yet one crosses the threshold and the other does not.

P-hacking and the replication crisis

Because p < 0.05 is the gate to publication in many journals, researchers face pressure, conscious or not, to keep analyzing data until they find a significant result. Run enough subgroup analyses, try enough outcome measures, and you will eventually find p < 0.05 by pure chance. This practice, called p-hacking, is a significant driver of the replication crisis in psychology and medicine.

False positive rate with multiple tests

If you run 20 independent tests at α = 0.05:
P(at least one false positive) = 1 - 0.95^20 ≈ 64%

Running many tests dramatically inflates your
chance of finding a "significant" result by chance.

How to read p-values correctly

A p-value is one piece of evidence, not a verdict. It should be read alongside effect size (how large is the difference?), confidence intervals (what is the plausible range?), and study design (was the test pre-registered?). A p-value of 0.001 in a poorly designed study is weaker evidence than a p-value of 0.04 in a rigorous randomised controlled trial.

The p-value is not broken, it is a useful tool when understood correctly. The problem is that it is routinely used as a binary pass/fail stamp on ideas that deserve more nuanced evaluation.

Share𝕏 Share

← Back to all articles

Continue Reading

Applied Probability

Bayes' Theorem and the Medical Test You Probably Misunderstand

8 min read Applied Probability

Expected Value: The Single Most Useful Concept in Decision-Making

7 min read Applied Probability

Confidence Intervals: What 95% Confidence Actually Means

8 min read