0

I am applying multiple regression with a data. There are 19 regressors in total and one of them is endogenous. For the endogenous variable I have identified an instrumental variable. When I apply instrumental variable regression (AER::ivreg in R) the results are satisfactory. Now I want an automated process to select the best subset among the 19 regressors. I can use stepwise regression (MASS package in R) and best subset regression (LEAPS package) but they are based on OLS. My question is, how to address endogeneity through these procedures or otherwise.

Amit
  • 1
  • 2
  • 2
    Best in what sense and for which stage? – dimitriy Jul 29 '20 at 16:09
  • I would like to find out the best models with n of 19 variables. 1<=n<=19. The criteria can simply be Adjusted R-squared. Step-wise regression package has options such as AIC... – Amit Jul 29 '20 at 17:21
  • 1
    Are you sure you want to use automatic model selection, given its [major problems](https://stats.stackexchange.com/a/20856/28500)? Do you need to reduce the number of predictors at all? How many cases do you have? If you have a model that works and isn't overfit, what is to be gained by reducing the number of predictors? Multicollinearity per se isn't a big problem unless it's complete. Otherwise you just have higher standard errors for coefficients. – EdM Jul 29 '20 at 17:21
  • There is multicollinearity in my model. I can remove some variables based on VIF but would be interested in an automated procedure. – Amit Jul 29 '20 at 17:24
  • I have thought about allowing multicollinearity but it may make some more variables significant. Will it be acceptable? For this study, identification of right significant variables is more important than estimates. – Amit Jul 29 '20 at 17:59
  • 1
    If you care about inference about the effect of a particular X (and doing IV suggests that you do), step-wise regression seems like a bad idea. It will lead to additional bias, on top of the bias from doing IV, which can be large in small samples. Multicollinearity is not in itself a problem for inference. Step-wise regression will not allow you to find a true DGP, with only the "right" variables. – dimitriy Jul 29 '20 at 18:02
  • Thanks @Dimitry V Masterov I understand the problem, will continue with the present set of variables – Amit Jul 29 '20 at 18:14
  • There is [Stata package](https://statalasso.github.io/docs/ivlasso_help/) that does something like what you have in mind using an `ivlasso` for regularization to get to sparse model. I don't know if there is an R equivalent. It does not seem too hard to hack something together if that does not exist. – dimitriy Jul 29 '20 at 19:22

0 Answers0