6

This section of Introduction to Statistical Learning in R (page 19 in v6, statement 2.3) is motivating the difference between reducible and irreducible error (that is noted by $\epsilon$ and has mean zero).

Consider a given estimate $\hat{f}$ and a set of predictors $X$, which yields the prediction $\hat{Y} = \hat{f}(X)$. Assume for a moment that both $\hat{f}$ and $\hat{X}$ are fixed. Then, it is easy to show that

$E(Y - \hat{Y})^2 = E[f(X) + \epsilon - \hat{f}(X)]^2$

$= [f(X) - \hat{f}(X)]^2 + Var(\epsilon)$

I'm having trouble getting the intermediate steps here. I understand that the expectation of the error in prediction should involve the variance of $\epsilon$ but I'd like to understand the proof.

I expanded to

$E(Y - \hat{Y})^2 = E[f(X) - \hat{f}(X)]^2 + 2E(\epsilon)E[f(X) - \hat{f}(X)] + E(\epsilon)^2$

and I see that I have $E(\epsilon)$ and $E(\epsilon)^2$ terms that could lead to $Var(\epsilon)$, but I'm stuck trying to fit some of the basic expected value and variance manipulations to it.

thomaskeefe
  • 285
  • 3
  • 10
  • Please add the `[self-study]` tag & read its [wiki](http://stats.stackexchange.com/tags/self-study/info). Then tell us what you understand thus far, what you've tried & where you're stuck. We'll provide hints to help you get unstuck. – gung - Reinstate Monica Jan 17 '16 at 19:59
  • Because you tell us the mean of $\epsilon$ is $0$, you may substitute $0$ for $E(\epsilon)$ and $\operatorname{Var}(\epsilon)$ for $E(\epsilon^2)$ in your formula. What does that produce? – whuber Jan 17 '16 at 20:24
  • OK, with that I solved it. Is it preferred for me to answer my own question with this information, or just edit my question with the answer? – thomaskeefe Jan 17 '16 at 20:38
  • 2
    The preference is to post an answer, please. – whuber Jan 17 '16 at 21:07

1 Answers1

13

\begin{align*} \mathbb{E}\left[(Y-\hat Y)^2\right] &=\mathbb{E}\left[\left(f(X)+\epsilon-\hat{f}(X)\right)^2\right] \\ &=\mathbb{E}\left[\left(f(X)+\epsilon-\hat{f}(X)\right) \left(f(X)+\epsilon-\hat{f}(X)\right)\right] \\ &=\mathbb{E}\left[\left(f(X)-\hat{f}(X)\right) \left(f(X)+\epsilon-\hat{f}(X)\right) +\epsilon \left(f(X)+\epsilon-\hat{f}(X)\right)\right] \\ &=\mathbb{E}\left[\left(f(X)-\hat{f}(X)\right)^2 +\epsilon \left(f(X)-\hat{f}(X)\right) +\epsilon \left(f(X)-\hat{f}(X)\right) +\epsilon^2\right] \\ \text{Because the expectation is linear}&\\ &=\mathbb{E}\left[\left(f(X)-\hat{f}(X)\right)^2\right] +\mathbb{E}\left[\epsilon^2\right] +2\mathbb{E}\left[\epsilon \left(f(X)-\hat{f}(X)\right)\right] \\ \text{Because the expectation of $f$ and $\hat{f}$ are constant}&\\ &=\left[f(X)-\hat{f}(X)\right]^2 +\mathbb{E}\left[\epsilon^2\right] +2\mathbb{E}\left[\epsilon \left(f(X)-\hat{f}(X)\right)\right] \\ \text{Because the mean of $\epsilon$ is zero}&\\ &=\left[f(X)-\hat{f}(X)\right]^2 +\mathbb{E}\left[\epsilon^2\right] \\ \text{Because the variance of $\epsilon$ is $\mathbb{E}(\epsilon^2)$}&\\ &=\left[f(X)-\hat{f}(X)\right]^2 + \text{Var}(\epsilon) \end{align*}

Joshua Cook
  • 328
  • 1
  • 3
  • 11
  • 1
    With self-study questions, we generally prefer giving hints rather a full answer. However, since the question is rather old, and the question hasn't been asked as part of a homework sheet that other students might copy later (but rather, a difficulty following a textbook), I think the provision of a full answer in this case is actually helpful. – Silverfish Mar 08 '16 at 17:54
  • Ah, I see. The answer was actually self-study myself. I came looking for the answer to the same question, having prepared everything but the last two steps. I guess I wanted to get credit for my work :D – Joshua Cook Mar 08 '16 at 17:56
  • You can read [our policy on self-study questions here](http://stats.stackexchange.com/tags/self-study/info), but I thought this was fine :) – Silverfish Mar 08 '16 at 19:05
  • I appreciate the posted answer because I realized I hadn't properly solved it when I was going to post the solution to my own question. – thomaskeefe Mar 09 '16 at 21:23
  • why are $f$ and $\hat{f}$ constant? – baxx Nov 09 '19 at 22:43
  • Second @baxx question. Why are $f$ and $\hat{f}$ constant? – ngram Nov 06 '20 at 23:21
  • Because this is relative to a single fit. – Joshua Cook Nov 07 '20 at 01:27
  • Why f and f^ are constant?: Suppose you have a 1 X 3 matrix. You have X (input variable), Y (real measured value of outcome) and a Y hat (you used a linear function f hat, or in layman terms a formula to get Y hat). Now Y = f(X) + error. Here, whenever you try to measure Y, it will always differ from the previous measurement, but the underlying f(X) remains the same. It would always be the ever-changing error term making the difference. And when you calculate Y hat as an estimation for Y, you would use the same f hat on the same X, resulting in the same Y hat. So you see the nuance here. – Piyush Verma Mar 28 '21 at 15:48