In linear regression, the coefficient of determination $R^2$ is a normalized measure for prediction accuracy. In machine learning, performance measures are not computed by estimating the performance on the same data that has been used for training, because this would yield a too optimistically biased estimator.
I wonder why in linear regression $R^2$ is estimated from the predictions on the training data that has been used to estimate the model parameters. I would expect this to be optimistically biased, especially in cases of a low sample size / number of parameters ratio, and situations of overfitting will thus go unnoticed.
Why is it not estimated with leave-one-out (aka "n-fold cross-validation") or bootstrap?