I am trying to do exhaustive subset selection in python. So, I need to select the best model among models having "n" predictors. Is R-Squared enough? Or is Adjusted-R-Squared better? It will be nice if you could also write about AIC, BIC.
Asked
Active
Viewed 17 times
0
-
Adjusted $R^2$ does not apply here. Since all of your models will have the same number of parameters; you get no additional information from the adjusted $R^2$ beyond what you get from the usual $R^2$. The biggest question I have is what would be “best” for you? Are you looking for a low MSE? (If you don’t know, then you’re probably looking for a low MSE.) – Dave Feb 02 '20 at 00:10
-
Yes, looking for a low RMSE or MSE. Okay, understood why Adjusted-R square will not make any difference - p, i.e. the number of parameters is the same for both models. – Jeetendra Gan Feb 02 '20 at 00:38
-
Then compare the MSE or RMSE (no need to do both), preferably on some holdout data. AIC and BIC are (in some sense) equivalent to looking at MSE, since your models will all incur the same penalty for parameter count, so the only differences between the models will be in how they fit the data (so MSE). (This changes if you are using different data sets, but I suspect that all of your models are on the same data.) – Dave Feb 02 '20 at 00:47
-
Yes, they are on the same data set. Thanks. Also, when selecting multiple models of the same size but different features, why is it said that holdout should be enough and that cross-validation is not required? – Jeetendra Gan Feb 02 '20 at 00:55
-
1Search on all subsets regression (there are hundreds on CV.com so I sorted by votes). The best answer I found was:https://stats.stackexchange.com/questions/18214/why-is-variable-selection-necessary – DWin Feb 02 '20 at 01:17