4

I recently read the following statistical 'sin' here:

Something I see a surprising amount in conference papers and even journals is making multiple comparisons (e.g. of bivariate correlations) and then reporting all the p<.05 results as "significant" (ignoring the rightness or wrongness of that for the moment).

Can someone explain this 'sin' to me? I have run about 40 correlation tests and all are significant!

(I presume this sin has something to do with the notion that a statistically significant finding may not necessarily be a meaningful finding.)

Adhesh Josh
  • 2,935
  • 16
  • 50
  • 67
  • 2
    Getting 40 significant results out of 40 tests suggests those tests aren't telling you a whole lot: they are either the wrong tests to use or you're using them to point out something very obvious. – whuber Sep 23 '11 at 18:14
  • 2
    Is there anything you would like to know that is not in the [large list of similar questions](http://stats.stackexchange.com/search?q=%2Bcorrelation+%2B[multiple-comparisons]) such as http://stats.stackexchange.com/questions/5750/look-and-you-shall-find-a-correlation, http://stats.stackexchange.com/questions/13810/threshold-for-correlation-coefficient-to-indicate-statistical-significance-of-a-c, and http://stats.stackexchange.com/questions/8300/correlation-analysis-and-correcting-p-values-for-multiple-testing? – whuber Sep 23 '11 at 18:17

4 Answers4

8

If a result is statistically significant at the 95% level, then if you ran 100 tests you would expect to see 5 examples that "pass" the test of statistical significance even if the null hypothesis is true and the effect is due to random chance. If you perform such a multiple hypothesis test, an adjustment is normally made to compensate for this effect, such as the Bonferroni correction (the Wikipedia page, and links therein, should give you the information you need).

Dikran Marsupial
  • 46,962
  • 5
  • 121
  • 178
  • Thanks. I have used Sidak correction, so I think I am OK! – Adhesh Josh Sep 23 '11 at 17:46
  • 1
    No problem, these methods assume the trials are independent, which for many problems is not the case, and so the correction may be over-conservative. But in most cases it is better to be over-conservative than the opposite! – Dikran Marsupial Sep 23 '11 at 17:56
  • 3
    The logic is a little subtler than laid out here. When a result is significant at the 95% level, its p-value could be 0.05 or it could be practically 0. In the latter case one might expect *no* significant tests in 100 independent random samples. Another subtlety is that the p-value *observed* in one sample is not the same as the p-value that would be attained in the long run of repeated independent samples. Also, the implicit presumption of independence of the p-values is rarely satisfied; usually the p-values reported for a dataset are correlated. – whuber Sep 23 '11 at 18:03
  • 1
    +1 to what whuber wrote. It is somewhat ironic that a frequentist approach should run into difficulties as the result of repeated application! ;o) – Dikran Marsupial Sep 23 '11 at 19:04
  • @DikranMarsupial As I see it, it's the "we do that once" approach that runs into difficulties as the result of the lack of repetition. – xmjx Oct 06 '11 at 06:35
5

xkcd neatly illustrated the issue of only reporting positive results

enter image description here

Henry
  • 30,848
  • 1
  • 63
  • 107
1

On a side note: In my opinion running those tests and reporting the results is okay ... BUT it is important to disclose that the results came from just digging around in the data. So the correlations found are not supported by any theory but might be interesting relationships anyway and subsequent experiments could explore them. Reporting the data also would enable fellow researchers to compare these observations with results from their own studies. In general: yes, report them but make clear what they are.

xmjx
  • 765
  • 5
  • 11
  • @Flex I must say that the results do not come from digging around in the data. I have theoretical justification for each correlation. However, I must say that what I am measuring is obvious. I know there is a relationship but the size of the relationship is unknown, so I am finding this out using Pearson's r. I will be most grateful for your further feedback in view of this explanation. – Adhesh Josh Oct 06 '11 at 12:06
  • @Felix. I think Sadax correction assumes that each test is independent. Even if the variables and the relationship between the variables are dependent, wouldn't the test be independently done? (The variable are not combined) – Adhesh Josh Oct 06 '11 at 12:14
1

Another side note: Both the Bonferroni and Sidak correction assume independence - although not knowing your data, I strongly assume that your variables (and the relations between them) are not independent. Hence, your power will drop very much as you overcorrect your $\alpha$-level and so the likelihood of detecting "true" correlations is reduced as well.

(If your 40 correlations still all are significant, even after a very conservative correction, I agree with whuber that you probably test something obviuos or trivial...).

Another approach for testing the spuriousness of correlations with a lot of dependent variables could be a randomization approach:

Sherman, R. A., & Funder, D. C. (2009). Evaluating correlations in studies of personality and behavior: Beyond the number of significant findings to be expected by chance. Journal of Research in Personality, 43(6), 1053–1063. doi:10.1016/j.jrp.2009.05.010 PDF

[EDIT: I just found a similar answer by whuber on another thread]

Felix S
  • 4,432
  • 4
  • 26
  • 34
  • Bonferroni correction does not assume independence of the hypothesis tests. Have I misunderstood your remark? – cardinal Oct 06 '11 at 13:03
  • @cardinal: I wanted to make the point that in case of non-independent data, Bonferonni _overcorrects_ and is way to conservative. – Felix S Oct 06 '11 at 13:46
  • I guess my point was that Bonferroni correction does not have anything to do with independence of the hypothesis tests, as far as I can tell. In fact, if the tests *are* independent, then, if I am not mistaken, Bonferroni is *guaranteed* to overcorrect (why?). Bonferroni correction will work *exactly* if and only if the rejection regions of the individual hypotheses under test are all mutually disjoint. This is quite far from independence, and, indeed, *requires* them to be dependent in a very specific way. :) – cardinal Oct 06 '11 at 14:05
  • @cardinal: I'm not sure if we're talking about the same thing ;-) To state my point in other words: the more tests are non-independent, the more the actual p level will be _smaller_ than the nominal (corrected) level of .05, AFAIK. That means, the more dependencies are in the data, the more (over)conservative is the Bonferroni correction, resulting in lower power. Would you agreed with that? – Felix S Oct 06 '11 at 14:44
  • I think we may be talking about the same thing. However, consider the following simple scenario. Let $X \sim \mathcal U(0,1)$ and suppose Test 1 rejects when $X > 1 - \alpha$ and Test 2 rejects when $X < \alpha$. Then the probability that at least one of them rejects is $2 \alpha$. Hence, the Bonferroni correction is *tight* (due to the reason I stated in the other comment!). However, clearly these tests are not independent since $\mathbb P(\text{both reject}) = 0 \neq \alpha^2$. Likewise $\mathbb P(\text{neither reject}) = 1-2\alpha \neq (1-\alpha)^2$. – cardinal Oct 06 '11 at 15:12