11

On page 19 of the textbook Introduction to Statistical Learning (by James, Witten, Hastie and Tibshirani--it is freely downloadable on the web, and very good), the following is stated:

Consider a given estimate $$\hat{Y} = \hat{f}(x)$$ Assume for a moment that both $$\hat{f}, X$$ are fixed. Then, it is easy to show that:

$$\mathrm{E}(Y - \hat{Y})^2 = \mathrm{E}[f(X) + \epsilon - \hat{f}(X)]^2$$ $$ = [f(X) - \hat{f}(X)]^2 + \mathrm{Var}(\epsilon)$$

It is further explained that the first term represents the reducible error, and the second term represents the irreducible error.

I am not fully understanding how the authors arrive at this answer. I worked through the calculations as follows:

$$\mathrm{E}(Y - \hat{Y})^2 = \mathrm{E}[f(X) + \epsilon - \hat{f}(X)]^2$$

This simplifies to $[f(X) - \hat{f}(X) + \mathrm{E}[\epsilon]]^2 = [f(X) - \hat{f}(X)]^2$ assuming that $\mathrm{E}[\epsilon] = 0$. Where is the $\mathrm{Var}(x)$ indicated in the text coming from?

Any suggestions would be greatly appreciated.

wellington
  • 639
  • 2
  • 6
  • 12
  • 1
    Because this is from a textbook, you should add the `self-study` tag to your question. See http://stats.stackexchange.com/tags/self-study/info – Patrick Coulombe Jul 31 '14 at 20:05
  • 2
    Your notation is mystifying because $\mathrm{E}(Y - \hat{Y})^2 = \mathrm{E}[f(X) + \epsilon - \hat{f}(X)]^2$ literally means the square of the expectation. Assuming $\mathrm{E}(\epsilon)=0$, this immediately reduces to $(f(X)-\hat{f}(X)+\mathrm{E}(\epsilon))^2$ = $(f(X)-\hat{f}(X))^2$. Evidently, then, what you really want to compute is the expectation of the square, $\mathrm{E}[(f(X)-\hat{f}(X)+\epsilon)^2]$. But if so, the very first step in your derivation makes no sense. Could you edit the question to clear this up? – whuber Jul 31 '14 at 20:30
  • Hmm.. I see what you mean. I didn't see that simplification at first (i.e. that $E[f(X)+\epsilon - \hat{f}(X)]^2 = [f(X) - \hat{f}(X) + E(\epsilon)]^2 = [f(X) - \hat{f}(X)]^2$. But that further adds to my confusion about how we get $[f(X) - \hat{f}(X)]^2 + Var(\epsilon)$ as the answer. Where is the Var(\epsilon) coming from? I will edit the question to reflect this clarification. – wellington Jul 31 '14 at 20:42
  • I was not pointing to a simplification, but to a *distinction*: the expectation of the square does not equal the square of the expectation. Even after the edits your question does not seem to recognize this crucial fact. – whuber Aug 01 '14 at 01:01
  • 1
    The issue that I was having was the notation in the book. The way I was initially thinking of the problem, I was approaching it as $\mathrm{E}[(Y - \hat{Y})^2] = (\mathrm{E}[f(X) + \epsilon - \hat{f}(X)])^2$ i.e. quantity squared. What I later learned was, the book was trying to imply that $\mathrm{E}[f(X) + \epsilon - \hat{f}(X)]^2$ actually means $\mathrm{E}([f(X) + \epsilon - \hat{f}(X)]^2)$ I personally think this notation is a bit confusing, but it's how it's written in the text. I agree that it's important to remember that $\mathrm{E}[X^2] \neq \mathrm{E}[X]^2$ – wellington Aug 01 '14 at 01:08
  • I thought I was the only one struggling with whether authors meant " expectation of square", or "square of expectation". I still don't. And I think this question as stated continues to use the original (ambiguous/unclear) notation... Which it should. I will look to the answers for clarity on what authors meant. – The Red Pea Aug 10 '16 at 00:57

2 Answers2

7

Simply expand the square ...

$$[f(X)- \hat{f}(X) + \epsilon ]^2=[f(X)- \hat{f}(X)]^2 +2 [f(X)- \hat{f}(X)]\epsilon+ \epsilon^2$$

... and use linearity of expectations:

$$\mathrm{E}[f(X)- \hat{f}(X) + \epsilon ]^2=E[f(X)- \hat{f}(X)]^2 +2 E[(f(X)- \hat{f}(X))\epsilon]+ E[\epsilon^2]$$

Can you do it from there? (What things remain to be shown?)

Hint in response to comments: Show $E(\epsilon^2)=\text{Var}(\epsilon)$

Glen_b
  • 257,508
  • 32
  • 553
  • 939
  • I actually was able to get that far in the time I've been trying at this problem since. One of the confusions that I had the first time around was that I was treating the entire term, $\mathrm{E}[...]$ to be squared, rather than just squaring the inside, i.e. $\mathrm{E}([...]^2)$. I understand why $\mathrm{E}[f(X) - \hat{f}(X)]^2$ becomes $[f(X) - \hat{f}(X)]^2$ since it is just a number, and the expected value of a real number is just the number. What I don't understand is how $2\mathrm{E}[f(X)-\hat{f}(X))\epsilon] + \mathrm{E}[\epsilon^2]$ becomes $\mathrm{Var}(\epsilon)$... – wellington Aug 01 '14 at 00:16
  • see my additional hint. What now remains to be shown? – Glen_b Aug 01 '14 at 00:30
  • Well we know that $\mathrm{E}(\epsilon^2) = \mathrm{Var}(\epsilon) + \mathrm{E}[\epsilon]^2$. The only thing I can think of is that we now apply the assumption that $\mathrm{E}[\epsilon] = 0$, therefore $(\mathrm{E}[\epsilon])^2 = 0$. Am I on the right track? – wellington Aug 01 '14 at 00:35
  • Yes, that's it. So what's left? And what's assumed about those quantities? – Glen_b Aug 01 '14 at 00:35
  • Well since $[f(X) - \hat{f}(X)]$ is a just a constant, we can also factor it out of the second term, i.e. make it $2[f(X) - \hat{f}(X)]\mathrm{E}[\epsilon]$ and since $\mathrm{E}[\epsilon] = 0$, the middle term becomes zero. Then the final term becomes $\mathrm{Var}(\epsilon) + (\mathrm{E}[\epsilon])^2$, which is the same as simply $\mathrm{Var}(\epsilon)$. Therefore the final result would be $[f(X) - \hat{f}(X)]^2 + \mathrm{Var}(\epsilon)$. Ahh...kicking myself! I was seriously overthinking it...thanks so much for the help! – wellington Aug 01 '14 at 00:52
  • I fixed a few typos in my mathematics. It looks like you're set now. Similar "expand the square, use linearity of expectation and simplify" approaches work on a variety of related problems, even under somewhat different assumptions. – Glen_b Aug 01 '14 at 00:59
  • @Glen_b: Why is it that $[f(X) - \hat{f}(X)]$ is a constant? Isn't it possible that $\hat{f}$ could differ from $f$ by varying amounts depending upon what value we are considering in their domain? – George Apr 11 '16 at 13:48
  • 1
    @George see the conditions in the question which tell us we're at a fixed value of $X$. – Glen_b Apr 11 '16 at 17:03
  • It's still unclear to me why $Var(\epsilon) = E[\epsilon]^2$. Expanding the definition of $Var$, I can see how $Var(\epsilon) = E[\epsilon^2]$, but why is it also equal to $E[\epsilon]^2$? – George Apr 12 '16 at 04:45
  • @George It isn't – Glen_b Apr 12 '16 at 04:50
0

\begin{equation} \ E[(Y−\hat{Y})^2] = E[(f(X)+\epsilon-\hat{f}(X))^2] = E[(f(X)-\hat{f}(X))^2 + \epsilon^2 + 2\epsilon(f(X)-\hat{f}(X))] = E[(f(X)-\hat{f}(X))^2] + E[\epsilon^2] + E[2\epsilon(f(X)-\hat{f}(X))] = E[(f(X)-\hat{f}(X))^2] + E[\epsilon^2] + 2(f(X)-\hat{f}(X))*E[\epsilon].......(1)\\ \end{equation} The Last term is zero as the expected value of irreducible error is zero. And lets see where variance come from. In general: \begin{equation} \ Var(X) = E[(X−\bar{X})^2] = E[X^2 - 2X\bar{X} + \bar{X}^2] = E[X^2] - E[2X\bar{X}] + E[\bar{X}^2]\\ \end{equation} The mean of X is a constant and so is the square of the mean of X. Therefore equation becomes, \begin{equation} \ Var(X) = E[X^2] - 2\bar{X}*E[X] + \bar{X}^2 = E[X^2] - 2\bar{X}*\bar{X} + \bar{X}^2 = E[X^2] - 2\bar{X}^2 + \bar{X}^2 = E[X^2] - \bar{X}^2\\ Hence,\\Var(\epsilon) = E[\epsilon^2] - \bar{\epsilon}^2\\ \end{equation} But mean of $\epsilon$ is zero. So, \begin{equation} \\Var(\epsilon) = E[\epsilon^2].....(2) \\ \end{equation} Now taking equation 1, whose last term is zero & equation 2: \begin{equation} \ E[(Y−\hat{Y})^2] = E[(f(X)-\hat{f}(X))^2] + E[\epsilon^2] = E[(f(X)-\hat{f}(X))^2] + Var(\epsilon) \end{equation}

Mooncrater
  • 737
  • 2
  • 9
  • 19