What is the difference between in sample error and training error, and intuition of optimism?

Question

In the book Elements of Statistical Learning in Chapter 7 (page 228), the training error is defined as: $$ \overline{err} = \frac{1}{N}\sum_{i=1}^{N}{L(y_i,\hat{f}(x_i))} $$

Whereas in-sample error is defined as $$ Err_{in} = \frac{1}{N}\sum_{i=1}^{N}{E_{Y^0}[L(Y_{i}^{0},\hat{f}(x_i))|\tau]} $$

The $Y^0$ notation indicates that we observe N new response values at each of the training points $x_i, i = 1, 2, . . . ,N$.

Which seems to be exactly the same as training error because training error is also calculated i.e by computing the response of the training set using the fitted estimate $\hat{f}(x)$. I have checked this and this explanation of this concept, but could not understand the difference between training error and in-sample error, and why optimism is not always 0: $$ op\equiv Err_{in}-\overline{err} $$

So how are the errors $Err_{in}$ and $\overline{err}$ different, and what is the intuitive understanding of optimism in this context?

Additionally, what does the author mean by "usually biased downward" in the statement:

This is typically positive since err is usually biased downward as an estimate of prediction error.

while describing Optimism (Elements of Statistical Learning, page 229)

score 6 · Accepted Answer · answered Aug 05 '16 at 08:22

6

$Y^0$ in this setup has random part, e.g. with additive error $\varepsilon\sim N(0,\sigma_\varepsilon^2)$. So for fixed $(x,y)\in\mathcal{T}$, new response $Y^0$ to the predictor $x$ needs not to be the same as the corresponding training response $y$, hence the expectation $\operatorname{E}_{Y^0}$. "Biased downward" just means that $\overline{\mathrm{err}}$ is on average less than the true prediction error.

answered Aug 05 '16 at 08:22

Francis

2,972
1
20
26

Right, so is the random component the irreducible random error present in the data being modeled, and in $\topline{err}$ the $y_i$ is the mean of data points not including the random error $\epsilon$ – SpeedBirdNine Aug 05 '16 at 16:01
1

@SpeedBirdNine: $y_i$ belongs to a fixed sample $\mathcal{T}$ that has been observed, and $\overline{\mathrm{err}}$ is mean of the losses over $\mathcal{T}$ – Francis Aug 05 '16 at 20:49
I think it should not be `needs not to be the same` but `doesn't need to be the same`... – Francesco Boi Nov 06 '19 at 14:44
@FrancescoBoi: I think these two phrases are semantically the same? That said, assuming continuous error, a more technically correct term to use is "almost surely not the same". – Francis Nov 06 '19 at 15:11
When I first read it, I interpreted `needs not to be` as `not` being the negation of `be`, i.e. `need to be different`, not of `need`. Now I see what you mean, honestly I do not know but I understood what you meant now. – Francesco Boi Nov 06 '19 at 16:45

What is the difference between in sample error and training error, and intuition of optimism?

1 Answers1

Linked