1

I have forty candidate predictors. They are no colinear. I want to know which ones are related to the DV. Prediction isn't important to me. I want to do this in an exploratory and data-driven way.

What's my best option? I've looked at: multiple regression, stepwise regression (AIC,BIC), best subsets regression, and Adaptive LASSO.

Is one of those better than the others? And if not, what is a better option?

Dave
  • 1,641
  • 2
  • 14
  • 27
  • Check out articles by [Chernozhukov](https://arxiv.org/search/?query=chernozhukov&searchtype=all&source=header) and [various other](https://arxiv.org/search/?query=double+post+lasso&searchtype=all&abstracts=show&order=-announced_date_first&size=50), for high dimensional inference/treatment problems. Double post-lasso may be a good start, check out ``hdm`` package of R. – runr Sep 14 '20 at 22:09

1 Answers1

0

There is a battery of problems with stepwise regression, which also includes best subset regression. Multiple regression is fine if $n\gg p$, otherwise you may be better off using regularization.

Namely, a regularized model (like the LASSO you mentioned), restricts the total size of the parameter estimates. Perhaps somewhat surprisingly, introducing this bias can yield better estimates, especially for small $\frac{n}{p}$ ratios.

runr
  • 622
  • 3
  • 13
Frans Rodenburg
  • 10,376
  • 2
  • 25
  • 58