In a recent meta-analysis regarding the effects of statins on mortality, Ray et al(1) wrote “there is no evidence for the benefit of statin therapy on all-cause mortality in a high-risk primary prevention setup.” The author’s conclusion (based on a risk ratio of 0.91; 95% confidence interval, 0.83-1.01; P≥.05) is emblematic of a growing trend: categorizing studies as positive or negative solely on the results of statistical testing.
The foundation of evidence-based medicine is the unbiased, objective, and systematic interpretation of clinical studies. In the pursuit of this impartiality, clinicians have come to rely on statistical significance and its epitome, the P value. Although significance testing has become the de facto standard for interpreting study outcomes, ironically, it was never meant to serve this purpose. Rather, statistical inference was devised to summarize stochastic variation. Many clinicians remain unaware of this fact and consequently fail to recognize that the evidence supporting a hypothesis cannot be summarized in a statistical value. How did this happen?
In the 1920s, Fisher(2) proposed the letter P as a statistical index to summarize the strength against a null hypothesis. He suggested that P≤.05 be given a special place, because a 1 in 20 probability of a chance result was a “convenient” cutoff for scientific investigation. Although this cutoff is pragmatic and represents the principal form of reporting results in the medical literature, it turns a blind eye to the preexisting probabilities of a hypothesis.3 This nuance was not lost on the Reverend Thomas Bayes, who (200 years before Fisher) demonstrated the considerable influence of prior probability when examining new evidence.4 Although frequentist statistics are more practical and dominate biomedical reporting, clinicians are interested in Bayesian estimates (ie, the best scientific estimate of truth given the available evidence). Consider the following scenario: Two positive results for human immunodeficiency virus are called in. The first test originates from a 45-year-old homeless intravenous drug-user who presented with substantial weight loss, whereas the second test originates from a 50-year-old nun without symptoms. A narrow frequentist approach would conclude that both patients are human immunodeficiency virus positive. However, clinicians operating on Bayesian logic would argue that the former is likely true and that the latter is likely false-positive. Because statistical convention dwells only on producing evidence against a single hypothesis, selecting the appropriate hypothesis to test through a lens of pre-study probabilities is vital when interpreting study results.
Historical perspectives aside, many equate P values to the probability a study finding is true. The ramifications of this misunderstanding are important and diverse and illustrated through the following examples:
To read this article in its entirety, please visit our website.
— Vineet Chopra, MD, MSc, Rodney A. Hayward, MD
This article originally appeared in the June 2012 issue of the The American Journal of Medicine.