Confusion about derivation of regression function

Question

I know that there are already some questions with the same title and my question is kind of similar but requires different derivation (I think).

I am reading "The Elements of Statistical Learning" and I don't understand how they got to the expression in equation $(2.27)$: $$EPE(x_0) = E_{y_0 | x_0}E_\mathcal{T}(y_0 -\hat{y}_0)^2$$

and how does it relate to the definition of the expected (squared) prediction error (EPE) in equation $(2.9)$?

$$EPE(f) = E(Y-f(X))^2$$

Cristi · Answer 1 · 2017-01-22T18:49:32.500

1

$EPE(f) = E_{x_0}EPE(x_0)$ where $f(x) = x_0^T\beta$ The expected predicted error for the linear regression is the expectation of $EPE(x)$

edited Jan 22 '17 at 18:49

answered Jan 22 '17 at 18:39

Cristi

146
2

Sextus Empiricus · Answer 2 · 2019-02-01T09:42:36.553

I don't understand how they got to the expression in equation (2.27):

The conditioning on $y_0$ stems from Expected prediction error - derivation you should read it as:

$$\underbrace{E_{y_0|x_0} \underbrace{\left( E_{\tau} (y_0-\hat{y}_0)^2 | y_0,x_0 \right)}_{\substack{\text{expected error due to:} \\ \text{ error in estimate $\hat{y}_0$} \\ \text{with fixed $y_0$}}}}_{\substack{\text{expected error due to:} \\ \text{ error in estimate $\hat{y}_0$} \\ \text{and error in sample $y_0$} \\ \text{ with variable $y_0$}}}$$

Eventually you end up taking the sum of the error in the estimate $\hat y_0$ (which can be decomposed in variance and bias, where the bias is zero in this case) and the error in the sampled variable $y_0$. For the expression of those two, $y_0$ and $\hat{y}_0$, see also the question $\hat{y} \sim\mathcal{N}(X\beta, \sigma^{2}X(X^{T}X)^{-1}X^{T}) = y \sim\mathcal{N}(X\beta, \sigma^{2}I_n)$

how does it relate to the definition of the expected (squared) prediction error (EPE) in equation (2.9)?

You should read that $EPE(x_0)$ as the predicted error of $f = x\beta$ in the point $x_0$. It is shorthanded for $EPE(f(x_0)) = E(\hat y_0 - y_0)^2$. It is the expectation value for the squared difference between a prediction $\hat y_0$ in the point $x_0$ (based on the training sample $\tau$) and a new sample $y_0$ in the point $x_0$

Confusion about derivation of regression function

2 Answers2

Linked