# The Significance of Significance

A few years ago, I was at a medical conference when a presenter pulled a bit of academic magic: He showed us that the study he just finished wound up being just below the statistically significant threshold, but by making some reasonable adjustments--throwing out a patient or two for not fitting the exact study criteria upon further evaluation--he had, abracadabra!, a *significant* finding. The quantitative changes in his results were almost negligible, but the labeling difference was huge: The study now had all this new weight to it.

But despite this new label of importance, it turns out there's no good theoretical reason for making the distinction. As Tom Siegfried highlights in an outstanding article in Science News:

Statistical significance is a phrase that every science graduate student learns, but few comprehend. While its origins stretch back at least to the 19th century, the modern notion was pioneered by the mathematician Ronald A. Fisher in the 1920s. His original interest was agriculture. He sought a test of whether variation in crop yields was due to some specific intervention (say, fertilizer) or merely reflected random factors beyond experimental control.

Fisher first assumed that fertilizer caused no difference — the “no effect” or “null” hypothesis. He then calculated a number called the P value, the probability that an observed yield in a fertilized field would occur if fertilizer had no real effect. If P is less than .05 — meaning the chance of a fluke is less than 5 percent — the result should be declared “statistically significant,” Fisher arbitrarily declared, and the no effect hypothesis should be rejected, supposedly confirming that fertilizer works.

Fisher’s P value eventually became the ultimate arbiter of credibility for science results of all sorts — whether testing the health effects of pollutants, the curative powers of new drugs or the effect of genes on behavior. In various forms, testing for statistical significance pervades most of scientific and medical research to this day.

Siegfried's point, in highlighting this history, is that as knowledge has progressed, most scientists have precious little time to interrogate something like the soundness of statistical theories and methods. They need tools to use to create other knowledge, which will then be categorized as significant or insignificant and often disseminated as such.

And this problem isn't just theoretical: As he notes, decisions about antidepressants, diabetes medication, and all sorts of literally life-or-death type choices have rested on a foundation of shaky statistics.

Siefried focuses largely on the challenge of bringing statistical rigor into academia--but it got me thinking of a separate but related issue: As we continue to manage increasing amounts of information in our personal and professional lives, most of which remains outside of our areas of expertise, how do we decide when to act on it?

For example, I've suggested that one major strategy for dealing with information and task overload in our personal lives is to simply offload the challenge--let someone more knowledgeable prepare your diet plan, your financial plan, your workout plan--follow the steps, and trust the expert. In other words, one strategy for dealing with information overload is to consciously limit what we think about.

Another example: my colleague Mike Liebhold has developed an incredibly powerful forecast about the therapeutic potential of information. In the not too distant future, he has forecast, individual patients will be able to receive relevant, contextually appropriate health recommendations as part of their daily lives, based on combining and mining data from everything from environmental sensors to medical studies to information from individual consumer devices to deliver evidence-based health recommendations to individuals right as they're about to make a decision related to their health.

Until recently, I hadn't thought much about how valid we might want those recommendations to be, but it's an important line of thinking. If current research suggests that the best way to exercise is in short bursts of interval training, how strong should the new research be before I switch over to some other form of training? And how much should the standard change, if at all, depending on the decision I'm making--for example, should the standard for prescription shifts be different from the standard for diet or exercise recommendations? And perhaps most importantly, who's analysis should we trust: The statistician who can interrogate the math or the scientist who understands the biology?

Now, in some sense, none of these questions are new, per se, and that is part of the problem. Scientists and doctors have been using a social convention about statistics--that a p-value of .05 is significant--for many decades without too much examination because, for all that it may be imperfect, it seems to work out okay. And we take drugs or make health choices based on those recommendations because we assume they've been vetted rigorously. We inhabit, in a sense, a world where we place stunning amounts of trust into things because they seem to be working out okay.

And this, I think, is the challenge: To develop tools to question and interrogate all the stuff that seems to be working out--everywhere from the lab to the doctor's office to an individual's life as a patient--even as knowledge outpaces what we can possibly keep up with.

- Bradley Kreit's blog
- Login to post comments

## Comment from Mike

Sean Ness

I'm curious if the topics discussed in the Wired Magazine article of last summer Placebos Are Getting More Effective. Drugmakers Are Desperate to Know Why might be, to some degree, the present day culmination of the history of statistical significance. Let's suppose the understanding of statistical significance is flawed within the pharmacology industry but is deeply ingrained. Is it possible that advances in biotech and drug development have crossed a line such that the "placebo response" is, in fact, a measure of a misunderstanding of statistical significance in human biology?

From the Wired article:

Two comprehensive analyses of antidepressant trials have uncovered a dramatic increase in placebo response since the 1980s. One estimated that the so-called effect size (a measure of statistical significance) in placebo groups had nearly doubled over that time....It's not that the old meds are getting weaker, drug developers say. It's as if the placebo effect is somehow getting stronger.The fact that an increasing number of medications are unable to beat sugar pills has thrown the industry into crisis. The stakes could hardly be higher. In today's economy, the fate of a long-established company can hang on the outcome of a handful of tests.Why are inert pills suddenly overwhelming promising new drugs and established medicines alike? The reasons are only just beginning to be understood. A network of independent researchers is doggedly uncovering the inner workings—and potential therapeutic applications—of the placebo effect.

--Mike Karlesky