I ran 160 regressions on various combinations of data sets, predictors, and dependent variables. I am now trying to sift through the results and separate good models which are "real" versus those which are due to random chance. My professor sent me this cartoon. It seems to be implying that by using $\alpha = 0.05$, you are highly likely to get significance in 20 data sets.
Questions:
Does random data have a 0.05 probability of being significant at $\alpha = 0.05$?
In other words:
$$\Pr(sig>0.05|n=20) = 1 -(1-0.05)^{20} = 0.6415$$
Not exactly a guarantee as suggested by the cartoon.
Is the above calculation true for regressions?
- Is there a word for this "random correlation" so I can do further research?