Supplement companies love to cite studies. The label says “clinically studied,” the website references “published research,” and the claims sound scientific. But “studied” and “proven” are different words for a reason. You don’t need a science degree to tell them apart - you just need to know what to look for.

A worked example: let’s walk through a real study
#

Instead of talking about study design in the abstract, let’s walk through an actual paper. In 2021, a group of researchers published a randomized, double-blind, placebo-controlled trial of a curcumin extract for knee osteoarthritis in the journal Nutrients. Here’s what you’d see if you looked at it yourself.

The study enrolled 101 participants with knee osteoarthritis, randomly assigned them to receive either 500 mg of a bioavailability-enhanced curcumin extract or a placebo twice daily for 12 weeks. Both groups were similar at the start - same average age, same baseline pain scores, same BMI. That matters. If the treatment group happened to have milder arthritis to begin with, any improvement could just reflect that starting difference rather than the supplement.

At the end of 12 weeks, the curcumin group’s knee pain scores on the KOOS scale (Knee Injury and Osteoarthritis Outcome Score) dropped from about 55 to about 37 - an 18-point improvement. The placebo group dropped from about 54 to about 47 - a 7-point improvement. The difference between groups was 11 points on a 100-point scale.

The p-value for this difference was p = 0.009. What does that actually mean?

P-values: what they tell you and what they don’t
#

A p-value of 0.009 means that if curcumin had zero real effect and the results were purely due to chance, you’d see a difference this large (or larger) about 9 times out of every 1,000 trials. In other words: unlikely to be noise.

The conventional cutoff for “statistically significant” is p < 0.05, which means less than a 5 percent chance the result is random. Our curcumin study clears that bar easily. But statistical significance is not the same as clinical significance. A p-value tells you “this probably isn’t random.” It doesn’t tell you “this effect matters to patients.”

A supplement that lowers blood pressure by 1 mmHg could be statistically significant if the study is large enough. But nobody’s doctor is going to change their treatment plan over 1 mmHg. That 1 mmHg reduction is statistically significant but clinically meaningless.

In our curcumin study, the 11-point difference on the KOOS pain subscale does reach what researchers consider clinically meaningful - generally defined as 8-10 points on this particular scale. So this study clears both bars: the result probably isn’t random, and the effect is large enough that patients might actually notice.

This distinction - statistically significant versus clinically meaningful - is where most supplement marketing gets sloppy. Companies will cite a “statistically significant improvement” without mentioning that the improvement was 0.3 points on a 50-point scale. The math works. The meaning doesn’t.

Effect sizes: how big is big enough?
#

Statistical tests answer “real or random?” Effect sizes answer “how real?” Common measures include Cohen’s d (where 0.2 is small, 0.5 is medium, and 0.8 is large) and raw differences like the 11-point KOOS improvement above.

Here’s a rule of thumb: if a study reports statistical significance but doesn’t give you a clear effect size - the actual improvement in the treatment group compared to the control group, in real units - be suspicious. “Statistically significant” without numbers is a marketing claim wearing a lab coat.

In supplement research, effect sizes tend to be small to medium. A 2022 meta-analysis of ashwagandha for anxiety found a standardized mean difference of about -0.5 - a medium effect. A 2021 meta-analysis of curcumin for osteoarthritis found pain reductions in the range of 5-12 points on 100-point scales, depending on the formulation and study. These are real effects, but they’re not miracle-level effects. If a supplement claim sounds too dramatic to fit in that range, the study is either being misrepresented or the study itself is unreliable.

Absolute risk versus relative risk: the most abused distinction in supplement marketing
#

This is where supplement companies do their best statistical sleight of hand. Relative risk sounds impressive. Absolute risk tells you what actually matters.

Imagine a study following two groups of 1,000 people each for five years. In the placebo group, 20 people develop condition X. In the supplement group, 10 people develop it.

The relative risk reduction is 50 percent: the supplement cut the number of cases in half. That’s the number the ad will use. “Reduces risk by 50 percent!” Technically true. Misleading in practice.

The absolute risk reduction is 1 percent: from 2 percent (20/1000) to 1 percent (10/1000). That means 100 people need to take the supplement for five years to prevent one case. That’s the number needed to treat (NNT) - and 100 is a lot of pills for one prevented case.

Both numbers are mathematically correct. One tells a 50-percent story. The other tells a 1-percent story. When you see a supplement claim that says “reduces risk by X percent,” ask: absolute or relative? If the ad doesn’t specify, assume relative - and if they won’t tell you the absolute numbers, the relative number is probably hiding a small absolute effect.

Confounders: what else was going on?
#

Observational studies - the kind that track thousands of people for years and look for associations - can show relationships but can’t prove causation. The classic confounder in supplement research is the “healthy user effect”: people who take supplements tend to exercise more, smoke less, eat better, and have higher incomes and better healthcare access than people who don’t. Any of those factors could explain lower disease rates.

A study finding that vitamin D users have lower heart disease rates hasn’t shown that vitamin D prevents heart disease. It’s shown that vitamin D users - who differ from non-users in dozens of ways - have lower heart disease rates. To separate the supplement from the lifestyle, you need a randomized controlled trial, where the only systematic difference between groups is whether they got the supplement or a placebo.

When you hear “linked to,” “associated with,” or “correlated with” - that’s observational. It might be real. It might be healthy-user confounding. The study design itself can’t tell you which.

The red flags checklist: how to spot a weak study in 30 seconds
#

Here’s a practical filter. When you encounter a supplement claim that cites research, run through these questions:

1. How many people were in the study? If the answer is under 30, the result could easily be noise. For supplement claims, look for at least 50-100 participants per group - enough for basic statistical power to detect a medium effect.

2. Was there a control group, and was the study blinded? An “open-label” study where everyone knows they’re getting the treatment will almost always show larger effects than a blinded study. The placebo effect is real and powerful - people who believe they’re taking something effective tend to report improvement. Without blinding, you can’t separate the supplement from the expectation.

3. Was it pre-registered? Legitimate clinical trials are registered on ClinicalTrials.gov (or an equivalent international registry) before they begin. Pre-registration locks in the outcome measures and analysis plan, preventing researchers from cherry-picking the measures that happened to show an effect after seeing the data. If a study isn’t registered, you have no way of knowing whether the reported outcome was the one they planned to measure or the one that panned out.

4. Who funded it? Industry-funded studies are more likely to report positive results than independently funded ones. This isn’t conspiracy - it’s publication bias, selective outcome reporting, and the fact that companies rarely fund studies of supplements they suspect don’t work. Industry funding doesn’t invalidate a study. It does mean you should look for independent replication before taking the result at face value.

5. What’s the actual effect size in real units? A study can be statistically significant and still show an effect too small to matter. If the study reports a p-value but not the actual improvement in the treatment group versus control - in numbers you can understand - the omission is probably deliberate.

6. Is the study being cited accurately? Sometimes the problem isn’t the study - it’s the citation. A trial showing that curcumin reduces an inflammatory marker in blood samples gets cited as proof that it “reduces inflammation” or “treats arthritis.” Check whether the claim matches what the study actually measured. A mechanism (reduces CRP) is not the same thing as an outcome (improves joint pain and function).

7. Has it been replicated? One study - even a good one - is provisional. Replication by independent researchers, ideally in different populations, is what moves a finding from “interesting” to “probably real.” If every claim about a supplement traces back to the same 30-person trial from 2015, the evidence base is thin regardless of how many websites cite it.

The study-type hierarchy: what different designs can and can’t tell you
#

Meta-analyses and systematic reviews sit at the top. They pool results from multiple studies, increasing statistical power and reducing the influence of any single biased trial. A Cochrane review - widely considered the gold standard - is systematic, pre-registered, and conducted by independent researchers. When a Cochrane review says a supplement works (or doesn’t), that’s the strongest evidence available.

Randomized controlled trials (RCTs) are the gold standard for individual studies. Random assignment controls for confounders (known and unknown). Blinding controls for placebo effects and expectation bias. A well-designed RCT can demonstrate causation, not just association. Most supplement trials that get cited are RCTs - but quality varies dramatically.

Observational studies (cohort studies, case-control studies) can show associations at population scale but can’t prove causation. They generate hypotheses. They don’t test them.

Animal studies and in vitro studies (petri dish, cell culture) generate hypotheses at even earlier stages. A compound that kills cancer cells in a dish or reduces inflammation in mice hasn’t been shown to do anything in humans. Most compounds that look promising in animal models fail in human trials. If the best evidence for a supplement comes from animal studies, the human evidence base doesn’t exist yet.

Case reports and anecdotes are the weakest form of evidence. An individual story of benefit (or harm) can generate hypotheses and detect safety signals. It can’t establish that a supplement works.

Putting it all together: the three questions that matter most
#

You don’t need to evaluate every statistical test in a paper. If you’re looking at a supplement claim, ask three things:

How big was the study? Small studies generate unreliable results. Look for sample sizes in the hundreds, not the dozens.

How big was the effect? The supplement industry loves statistical significance. You want clinical significance: an effect large enough that someone would notice. Ask for the actual numbers.

Who paid for it? Industry funding isn’t disqualifying, but it’s a reason to look for independent replication. If every positive study on a supplement was funded by the company that sells it, and the independent studies are all negative or null, the pattern tells you something.

The answers to those three questions will tell you more than the headline ever will. A supplement doesn’t need to be a miracle to be useful. But it does need evidence, and evidence is something you can learn to see for yourself.