Can someone explain this cartoon?

Question

I ran 160 regressions on various combinations of data sets, predictors, and dependent variables. I am now trying to sift through the results and separate good models which are "real" versus those which are due to random chance. My professor sent me this cartoon. It seems to be implying that by using $\alpha = 0.05$, you are highly likely to get significance in 20 data sets.

Questions:

Does random data have a 0.05 probability of being significant at $\alpha = 0.05$?
In other words:

$$\Pr(sig>0.05|n=20) = 1 -(1-0.05)^{20} = 0.6415$$

Not exactly a guarantee as suggested by the cartoon.

Is the above calculation true for regressions?

Is there a word for this "random correlation" so I can do further research?

1. Yes, or more generally "if the null hypothesis is true." Random data do not always mean zero difference or zero correlation (aka you can have randomly generated data that are correlated at r = 0.8); 2. Why are you computing them as if they are dependent events? In the cartoon all 20 tests are independent events. 3. Usually it's called "Type I error" or "false positive." — Penguin_Knight, Jan 25 '16 at 15:36
**Your professor is using a humorous cartoon to warn you about the trap you have stepped into. Don't try to argue with the cartoon instead of heading the warning. You're right to want to learn more about the problem: http://semanticommunity.info/@api/deki/files/30744/Elder_Target_Shuffling_Sept.2014.pdf ** — Wayne, Jan 25 '16 at 16:35
Argh, I though it was a paper, but it's a slide deck. Should've looked closer. I hope you did get something out of it, since a lot of detail is left out of the slides to give the speaker room to talk. Bottom line is, even with the binomial math you did -- which isn't quite what you think it is -- the odds are almost 2:1 (0.6415 v 0.3585) that you have a problem. Would you listen to a lecture where someone started with, "the odds are 1 out of 3 that what I found might be real!" (Apply your math to your 160 regressions and the odds explode.) — Wayne, Jan 25 '16 at 20:52
@Wayne ,you hinted that the binomial math I did is not right. Why? Is is because the trials are not independent? I can see how a regression of X1 vs Y1 would not be independent of a regression of X1 vs Y2. If this is the issue, is there a way quantify the probability of getting a Type I error in n trials? — rconway91, Jan 26 '16 at 14:49
You are calculating the probability of exactly one occurrence, rather than the probability of one or more occurrences. And it's more subtle than that because we're dealing with a frequentist concept here and $\alpha=0.05$ does not mean what we might want it to mean. The cartoon's point is that, in layman's terms, if you allow that your significance test can be wrong as often as 5% of the time, you shouldn't be surprised when it is wrong 5% of the time (i.e. 1 of 20). Again, frequentist concepts are tricky so _technically_ the previous sentence may be a mis-statement, but as a concept it's fine — Wayne, Jan 26 '16 at 16:41

score 2 · Accepted Answer · answered Jan 25 '16 at 15:44

Your calculation is right assuming the p-values arise from data that are independent and arise from sufficiently large datasets (so that the p-values are really approximately uniformly distributed under the null hypothesis). If the data are not independent (e.g. when all people who were randomly assigned to eat a certain color of jelly bean are compared to a single group of people assigned to eat no jelly beans, or if people that ate red jelly beans are compared to people that did not eat jelly beans both in terms of occurrence of acne, cancer or death), things are also more complicated.

Thus, you are right that there is no guarantee that amongst 20 comparisons conducted under the null hypothesis there would be at least one type I error and the familywise type I error rate you calculated is exactly right. However, in case of 160 comparisons the familywise type I error rate should be very close to 100%. There are a number of possible ways to deal with this type of multiplicity. These include testing procedures that control the familywise type I error rate (e.g. the Bonferroni-Holm procedure) or the false discovery rate. I have also seen some Bayesians argue for (implicit) shrinkage approaches using some kind of hierarchical Bayesian model and there are almost certainly further things one could do.

Can someone explain this cartoon?

1 Answers1