Skip to content

Why lack of statistical significance shouldn’t depress us

January 9, 2013

Andrew Gelman writes:

With a small sample size, not every comparison is going to be statistically significant.

I know this is kind of obvious, but I feel like ecologists (especially as reviewers!) need to keep this little piece of wisdom in mind more often. In ecology its often very hard work to build up even a modest sample size. During a PhD, two grueling field seasons often generate the kind of sample size that the Higgs Boson people can get in a small fraction of a picosecond. And since causality in ecological systems is a complex web of interactions, we’re usually only looking for subtle effects. Small effects and small sample sizes mean that we should get a whole lot of false negatives, independent of the level of talent of the student.

I think we should all try to remember these basic facts of the lives of ecologists more often when evaluating manuscripts and CVs. A well-written manuscript is a well-written manuscript, regardless of how interesting significant the results are.

6 Comments leave one →
  1. January 9, 2013 9:40 pm

    Perhaps rather than just focusing on null hypotheses of no effect, we should be providing more information on what hypotheses *have* been ruled out with what level of “severity”:

  2. January 9, 2013 10:31 pm

    Yes. This would definitely be an improvement. Anything that will value studies that address questions with non-obvious answers with good quality data and sound interpretations, but that end up with insignificant results.

    I still don’t feel certain about what severity is, so this next comment may be weird, but how is your suggestion really any different then just reporting confidence intervals instead of hypothesis tests?

    • January 11, 2013 2:23 pm

      Re: confidence intervals vs. severity, the Mayo and Spanos article I plugged in my post addresses that question. They’re related, but not the same.

      • January 11, 2013 3:26 pm

        I haven’t got around to looking at the article, but poked around on her blog a bit yesterday. She emphasizes that severity is a metastatistical principle, in the sense that it doesn’t correspond to any particular well-defined calculations but rather relates to a general philosophical principle for interpreting frequentist analyses.

        I really like the severity idea. I wonder if it would be worth writing something about it for an ecological audience? If this idea might help make ecologists more comfortable with publishing data that are consistent with a null hypothesis, then I would be very interested in it.

        I should make clear though, that I don’t believe that all insignificant findings are worth publishing — its just that they are unfairly biased against in my opinion. At the risk of emphasizing what might be an often-made and obvious point: the result itself shouldn’t be a factor in whether to publish a paper; what should matter more is whether (1) the question is interesting, (2) we don’t know the answer (or lack a consensus or can call into question the current consensus or can strengthen a questionable consensus etc), (3) the data are appropriate for addressing the question, (4) the data analysis is sound, and (5) the paper is well-written. What we find just shouldn’t matter. Can severity help?

        But I do have some reservations about severity. You write that:

        “…If the mesh were larger (i.e. the test less powerful), then the inference that a fish at least 9 inches long was caught actually would be more reliable (i.e. the test of the hypothesis would be more severe)…”

        This worries me. We have more severity if the power is low!? This reminds me of one of those jokes that make fun of frequentist model selection: “Thank goodness I don’t have a larger sample size! ;)” (I think I first heard Gelman tell this joke) The problem with this is that with low power (relative to the effect size), frequentist hypothesis testing only allows us to reject a null if our data lead to an unreasonably large estimate of the effect (i.e. we get ‘lucky’). Detecting an effect with low power shouldn’t make us feel more confident about our inference, it should make us wonder if our data are too good to be true.

        So severity may not be the answer. Combining estimation procedures that lead to believable estimates (i.e. Bayes / penalized regression), with the severity principle may be the appropriate way forward. Bayes helps us be more conservative with respect to effect sizes, and severity helps us be more conservative with respect to error rates. I can’t help but think that we need both.

      • January 11, 2013 3:36 pm

        Hi Steve,

        Severity is both a metastatistical principle and a quantity that can be calculated in the context of a specific statistical test.

        Re: severity vs. power, I think if you read the Mayo and Spanos piece you’ll be reassured.

        Yes, I’ve been toying with the idea of writing a paper on severity for ecologists. This post was kind of a dry run for that. So I could probably be talked into collaborating on something. So tell you what: read Mayo and Spanos (and maybe play around with that “severity calculator” spreadsheet on Mayo’s blog), and then if you’re still interested in writing something let’s talk further.

  3. January 11, 2013 2:25 pm

    Mayo also has a recent post on her blog walking readers through an Excel file they can download, which works through severity calculations for a simple case.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: