0

Can anyone recommend any research papers where the undesirable effects of overfitting on statistical models were first observed?

In the context of regression, at what point did researchers begin to suspect that the desirable theoretical properties of regression parameter estimates from OLS guaranteed by the Gauss-Markov Theorem (e.g. BLUE - unbiased, minimum variance) might actually be resulting in poor generalization on unseen data, despite the fact that they are performing well on available data?

From a statistical context called "Shrinkage" and from a machine learning perspective called "Regularization" - historically speaking:

  • What exactly was the thought process which made researchers realize that models can overfit the data if nothing is there to stop them from doing so (e.g. in the absence of a regularization term in the optimization equation)?

  • And what exactly was the thought process that lead to the specific mathematical forms of these penalizing shrinkage-regularization terms (i.e. why did they just decide to "throw" in a Lambda term of the following form?)?

enter image description here

Are there any research papers where this realization of overfitting was first noted and recorded, and are there any research papers where the thought of process of rectifying this problem through penalty terms was also noted?

Thanks!

stats_noob
  • 5,882
  • 1
  • 21
  • 42
  • 4
    Does this answer your question? [The origin of the term "regularization"](https://stats.stackexchange.com/questions/250722/the-origin-of-the-term-regularization) (1963 "for sure" and 1943 "most likely".) – usεr11852 Jan 17 '22 at 01:00
  • Hello! Yes, I already came across this post and consulted some of the references provided over there. I am more interested in knowing that at what point did researchers realize "oh cr*p, there might be a problem here", with regards to models with low MSE (uncannily) performing badly on new data - and also realize that models with higher MSE on seen data somehow don't perform as badly on unseen data. – stats_noob Jan 17 '22 at 03:13
  • I mean, they were solving least squares back then too. The notion of solving linear systems of equations has changed (until the gradient descent ideas came in). – usεr11852 Jan 17 '22 at 09:12

0 Answers0