overfitting and selection model

Question

Currently I’m focusing on model selection criteria, more specifically: sequential hypothesis testing, information criteria (like BIC and AIC), Lasso. All of these in regression framework. These methods are useful as remedy for overfitting problem and in some ways allow us to manage the trade-off between parsimony and completeness of the models in light of prediction loss function. In other terms these methods permit to manage bias-variance trade-off. Now, in my main reference, these methods are used as “in sample methods” in the sense that the models is estimated on all data. The best model is chosen without out of sample measures.

However the problem at hand (overfitting) is expressed in natural way splitting the sample in two part (in and out). My doubt is related to the fact that, even if the methods above permit a good selection among predictors and then among models, the estimation involve all data. It seems me that in some extent the metrics like MSE result too optimistic. My idea is simply to use the methods above, after split the data. Then to use only “in sample” part for estimation purpose and then compare models performance, in term of loss function like MSE, on data never seen before “out of sample”.

Is it a good idea? If not why? Is not better than estimate on all data?

score 1 · Answer 1 · answered Sep 27 '19 at 15:06

1

What you're describing is simply the split in training data and test data, where the test data is not used for training at all.

You use only the training data to train your model. To avoid overfitting (on metrics like MSE), you could use ideas like cross-validation or bootstrapping.

You can estimate the generalization error on unseen data (which you don't have yet) by comparing your prediction with the learned model on the test data to the actual outcomes of the test data.

Sometimes you split your training data further into training data and validation data, where the validation data is not used to train your model, but to assess if/when the training is sufficiently good (e.g. in iterative procedures like neural networks).

answered Sep 27 '19 at 15:06

Edgar

1,391
2
7
25

Your answer seems to suggest that initial split is a good idea. If we use all data any loss function become too optimistic especially if unstable parameters problems occur. Do you agree? – markowitz Sep 28 '19 at 09:17
Moreover you speak about Cross Validation (CV) that involves some split by construction. I not mentioned the CV for not confuse the readers but, in my main reference, it is a competitors of techniques above. We can use CV on all data and, after all, we can have the same problem of before. Splitting further, as you says in your last sentence, seems me a good idea even with CV. – markowitz Sep 28 '19 at 09:18
Moreover maybe you can help me also in this related question: https://stats.stackexchange.com/questions/428678/neural-network-vs-regression-in-prediction#comment800188_428678 – markowitz Sep 28 '19 at 09:18
Yes, you will do overfitting. – Edgar Sep 28 '19 at 09:19

overfitting and selection model

1 Answers1

Linked