Applied Probability9 min read

The p-Value Crisis: Why Most Published Research May Be Wrong

The 0.05 significance threshold has governed scientific publishing for 80 years. It was never meant for that purpose, and the resulting crisis is reshaping statistics.

The Probability Lab Team

July 22, 2025

In 2005, Stanford epidemiologist John Ioannidis published a paper titled "Why Most Published Research Findings Are False." It became one of the most downloaded papers in the history of PLOS Medicine. Its argument was statistical, not anecdotal: under typical conditions of scientific publishing, the majority of published findings with p < 0.05 are expected to be false positives.

Understanding why requires understanding what a p-value actually measures, and what it does not.

The definition

The p-value is the probability of obtaining data as extreme as or more extreme than what was observed, assuming the null hypothesis is true. It is written formally as:

p-Value Definition

p = P(data as extreme as observed | H₀ is true)

What p-value IS:
  - Probability of the data, given no effect exists

What p-value IS NOT:
  - Probability the null hypothesis is true
  - Probability the result is due to chance
  - Probability that your conclusion is correct
  - Measure of the effect's importance or size

Where the 0.05 threshold came from

Ronald Fisher, in his 1925 book Statistical Methods for Research Workers, suggested that a probability of 1 in 20 (0.05) was a "convenient" level for declaring an effect worthy of investigation. He explicitly intended it as a rough heuristic for individual experiments, not as a universal publication threshold.

Fisher later wrote that a scientific fact should be demonstrated by repetition under varied conditions, not validated by a single p < 0.05 result. The binary publication criterion he inspired was not what he advocated.

The base rate problem

Suppose 10% of tested hypotheses are true (a reasonable estimate in early-stage research). A test with 80% power and α = 0.05:

	H₀ true (90 studies)	H₀ false (10 studies)
Significant result (p < 0.05)	4.5 false positives	8 true positives
Non-significant result	85.5 true negatives	2 missed effects

Among the 12.5significant results, 4.5 are false positives, a false discovery rate of 36%. If prior probability of a true effect is lower (5%), the false discovery rate rises above 50%.

The multiple comparisons problem

Family-Wise Error Rate

If you run k independent tests at α = 0.05:
  P(at least one false positive) = 1 − (1 − 0.05)ᵏ

k = 1:   5.0% false positive probability
k = 10:  40.1%
k = 20:  64.2%
k = 50:  92.3%

The Bonferroni correction: use α/k per test to maintain overall α.
With k = 20 tests: significance threshold = 0.05/20 = 0.0025

What is changing

The American Statistical Association issued a formal statement in 2016 warning against relying on p-value thresholds for publication decisions. Major journals including Basic and Applied Social Psychology have banned p-values entirely. Many fields now require pre-registration of hypotheses, larger samples, effect size reporting (Cohen's d, η², r), and replication before publication.

The p-value is not useless, it is one valid signal among several. The crisis arose from treating it as the sole arbiter of truth. A p-value tells you the data is surprising under the null hypothesis. It does not tell you the effect is real, large, reproducible, or important.

Share𝕏 Share

← Back to all articles

Continue Reading

Applied Probability

Bayes' Theorem and the Medical Test You Probably Misunderstand

8 min read Applied Probability

Expected Value: The Single Most Useful Concept in Decision-Making

7 min read Applied Probability

Confidence Intervals: What 95% Confidence Actually Means

8 min read