I have run an lm in R and derived model output which contains a few significant variables and a few that are not significant. I'm wondering if it is standard to now go on and simplify this model, or does the full model containing all of the estimates for each parameter, have more integrity? Thank you!
Asked
Active
Viewed 39 times
1 Answers
0
It's necessary to build a good strategy for feature selection (forward, backward, or stepwise).
If you include all the predictors in the model, your model could fit better your training set but it's not a good candidate to be generalized for the prediction due to overfitting problem.
-
1Stepwise feature selection leads to overfitting, cf. http://stats.stackexchange.com/questions/20836/algorithms-for-automatic-model-selection/20856#20856 – Tim Jul 20 '15 at 09:14
-
Thanks for your reply. So I should go ahead and simplify? In the past I have used the update function in R. Thank you. – George Jul 20 '15 at 09:15
-
Yes, you should look at this: http://stats.stackexchange.com/questions/56092/feature-selection-packages-in-r-which-do-both-regression-and-classification – Metariat Jul 20 '15 at 09:54