Can anyone recommend any research papers where the undesirable effects of overfitting on statistical models were first observed?
In the context of regression, at what point did researchers begin to suspect that the desirable theoretical properties of regression parameter estimates from OLS guaranteed by the Gauss-Markov Theorem (e.g. BLUE - unbiased, minimum variance) might actually be resulting in poor generalization on unseen data, despite the fact that they are performing well on available data?
From a statistical context called "Shrinkage" and from a machine learning perspective called "Regularization" - historically speaking:
What exactly was the thought process which made researchers realize that models can overfit the data if nothing is there to stop them from doing so (e.g. in the absence of a regularization term in the optimization equation)?
And what exactly was the thought process that lead to the specific mathematical forms of these penalizing shrinkage-regularization terms (i.e. why did they just decide to "throw" in a Lambda term of the following form?)?
Are there any research papers where this realization of overfitting was first noted and recorded, and are there any research papers where the thought of process of rectifying this problem through penalty terms was also noted?
Thanks!