0

The definition of overfitting is "the production of an analysis that corresponds too closely or exactly to a particular set of data, and may therefore fail to fit additional data or predict future observations reliably" (the model is good at training data and it is bad at test data).

But, is there a way to define overfitting programmatically ? For example; if a classification model's accuracy/f1 score is between %99 and %90 at training data and the model's accuracy/f1 score is equal or less than %80 at test data, the model overfits. Or if a regression model's rmse value is equal or less than 0.7 at training data (target variable ranges from 0 to 1000) and the model's rmse value is equal or more than 5.0 at test data, the model overfits.

Ayberk Yavuz
  • 101
  • 1
  • One of the big problems is the real world. It is pathological. If you don't clearly understand the data then there are things where it tricks you. Overfitting is about the death of generalization, but in general the universe is general while in specific it isn't. – EngrStudent Oct 22 '20 at 00:07
  • I think it is impossible to give absolute numbers, but if you can augment your data, turns out you can detect over fitting quite well, i.e. when the same input with different augmentations, starts to diverge, that is a very good sign of overfitting. you can read more about it in this paper: https://drive.google.com/file/d/13I1qhczfUaLYlEZSfJ04nkRXyD1a5I8Q/view – Eri Rubin Mar 18 '20 at 13:56

0 Answers0