On small numbers

Most of the surprising claims you read in a given week come from samples too small to support them. This is not a scandal. It’s structural. The bigger a study is, the more conservative its findings have to be; the smaller a study is, the more room there is for a striking effect to appear, real or not. The press, predictably, prefers the latter.

The instinct, once you notice this, is to dismiss small studies entirely. I’d push back on that. Small studies are often where new ideas get their first real look — they’re cheap, fast, and well-suited to checking whether something is even worth measuring. The problem isn’t the size. The problem is how their findings get reported.

A few rules of thumb I try to apply when I read one:

Look at the confidence interval, not the headline number. A point estimate of “+18%” from a sample of 40 is a polite way of saying “somewhere between -3% and +39%.”
Ask what would have to be true for this to be wrong. Tiny samples don’t fail in subtle ways. If a result is spurious, it’s usually because of one weird subgroup, or one bad measurement, or one decision made halfway through.
Treat the effect size with the seriousness it deserves. “Statistically significant” and “large enough to matter” are different questions, and small studies are good at one and bad at the other.

# A useful sanity check: simulate the result under the null.
# If a "p < 0.05" effect shows up >5% of the time in pure noise,
# you don't have evidence — you have a sample-size problem.

None of this is novel. It’s the kind of thing you learn in the first week of any stats class and then forget the first time you read a newspaper. The discipline isn’t in the math; it’s in remembering to apply it when the result is the one you were hoping for.