What is the difference between in sample error and generalization error?

Question

As in this question:What is the difference between in sample error and training error, and intuition of optimism?

In the book Elements of Statistical Learning in Chapter 7 (page 228), given a data set $\mathcal{T}=\{(x_i,y_i)\}, i=1,\dots, N$ the generalization error of a model $\hat{f}$ is defined by

$$ Err_{\mathcal{T}}=E_{X^0, Y^0}[L(Y^0, \hat{f}(X^0))|\mathcal{T}] $$

Whereas in-sample error is defined as $$ Err_{in} = \frac{1}{N}\sum_{i=1}^{N}{E_{Y^0}[L(Y_{i}^{0},\hat{f}(x_i))|\tau]} $$

The $Y^0$ notation indicates that we observe N new response values at each of the training points $x_i, i = 1, 2, . . . ,N$.

the training error is defined as: $$ \overline{err} = \frac{1}{N}\sum_{i=1}^{N}{L(y_i,\hat{f}(x_i))} $$

Question:

(1) What is the difference between generalization error and in-sample error? How to understand them?

(2) Why do we define AIC and BIC for estimation of in-sample error rather than the generalization error?

(3) For any loss function, do we always have a generalization error that is larger than training error? Is there theoretical proof? I only found the proof of mean square loss.

Can you share your source for (3)? – gunes Feb 06 '22 at 09:41 — gunes, Feb 06 '22 at 09:41

What is the difference between in sample error and generalization error?

0 Answers0