3

The standard way to avoid an over-fitting is to use a "validation set". It means that we split the data into two parts. The first part we use to fit (train) and the second part we use to validate.

Let us now assume that we have a huge number of models and we trained and validated all of them and then we chose the model that gave the best predictions on the validation set.

Can it be that we get an over-fitting in this way? In fact we did use the validation set to chose the model. So, we kind of used the validation set in a fit.

Glen_b
  • 257,508
  • 32
  • 553
  • 939
Roman
  • 1,013
  • 2
  • 23
  • 38
  • 2
    Yes, you can over-fit by doing that: see [here](http://stats.stackexchange.com/questions/4551/what-are-common-statistical-sins/6885). Typically you'd split the data into three: train, validation & *test*, the last being used for evaluation of the performance of your final model. – Scortchi - Reinstate Monica Apr 02 '15 at 15:36
  • 1
    Overfitting can be avoided (or its effect reduced) by using a simpler model, less features and more training data. The first two are driven by domain knowledge. – Vladislavs Dovgalecs Apr 02 '15 at 15:39
  • Also, its helpful to use a "k-fold" validation method to be a tad bit more rigorous with your out of sample validation. – Shreyes Apr 02 '15 at 16:48

0 Answers0