Cherry picking and low p-values

Asked Aug 07 '15 at 10:24

Active Aug 08 '15 at 22:18

Viewed 259 times

Let's say I run a lot of univariate OLS regression models, say 200,000, with 50 data points, then cherry pick the best one (highest r-square). If my $p$-value for this model is way less than 1/200,000, do my results still have any explanatory power?

The probability of getting this result when there is no relationship between the variables was extremely low, even running 200,000 instances of it, correct?

I'm generally interested in rules on how/when cherry-picking can still make sense.

edited Aug 08 '15 at 22:18

amoeba

93,463
28
275
317

asked Aug 07 '15 at 10:24

Alexis Eggermont

2

This is called "multiple comparisons problem" and what you suggest is more or less what is known as "Bonferroni correction". It is a valid procedure, yes. – amoeba Aug 07 '15 at 13:07
2

If you are running many univariate regressions, each choosing one from among multiple independent variables with the same dependent variable, then this approach can miss important combinations of independent variables and lose explanatory power. If that's what you're doing, edit the question or pose another that explains what you are trying to accomplish with your study. There are much better ways to proceed in that case. – EdM Aug 07 '15 at 13:35
Do you work in a fixed regressor setting? (I.e., is your regressor matrix $X$ stochastic or deterministic? The latter would be the case for lab experiments.) Depending on the answer, I would recommend one of two methods – Jeremias K Feb 17 '16 at 19:33

Cherry picking and low p-values

0 Answers0