Derivation of Equation of Reducible and Irreducible Error

Question

I am currently reading An Introduction to Statistical Learning by James, Witten, Hastie, and Tibshirani, and I am stuck on one of the leaps they take when defining reducible and irreducible error. They state that for a given set of input and output variable $X$ and $Y$, respectively, there is a function $f$ which we estimate, defined as $\hat f$. Using $\hat f$ we then obtain the our predictions of $Y$ which is called $\hat{Y}$ such that $\hat Y = \hat f(X)$. On page 19, equation 2.3, they give the following equation:

$$E(Y - \hat Y)^2 = E[f(X) + \epsilon - \hat f(X)]^2$$ $$ = [f(X) - \hat f(X)]^2 + Var(\epsilon)$$

They then go on to say that $[f(X) - \hat f(X)]^2$ is the reducible error while $Var(\epsilon)$ is the irreducible error.

I am by no means a mathematician and have tried some derivations but cannot reach that conclusion on my own, particularly in deriving $Var(\epsilon)$. Thank you.

-1 Possible duplicate of [Proof for Irreducible Error statement in ISLR page 19](http://stats.stackexchange.com/questions/191113/proof-for-irreducible-error-statement-in-islr-page-19) And also here http://stats.stackexchange.com/questions/110190/proof-derivation-of-residual-sum-of-squares-based-on-introduction-to-statistica Please use the search feature before asking a question. I just searched for page 19 and found all of these results. — Sycorax, Apr 02 '16 at 13:53

Gilles · Accepted Answer · 2016-04-02T11:59:18.693

9

Because $E\left[\textrm{constant}\right]=\textrm{constant}$, (and $f$ and $\hat f$ are constants), $E\left[\epsilon\right]=0$, and $\textrm{Var}(\epsilon) = E\left[\left(\epsilon - E\left[\epsilon\right]\right)^2\right]$. You get the expected results. \begin{align} E(Y - \hat Y)^2 &= E\left[f(X) + \epsilon - \hat f(X)\right]^2\\ &= E\left[\left(f(X) - \hat f(X)\right)+\epsilon\right]^2\\ &=E\left[\left(f(X) - \hat f(X)\right)^2+2\epsilon\left(f(X) - \hat f(X)\right)+\epsilon^2\right]\\ &=E\left[\left(f(X) - \hat f(X)\right)^2\right]+E\left[2\epsilon\left(f(X) - \hat f(X)\right)\right]+E\left[\epsilon^2\right]\\ &=\left(f(X) - \hat f(X)\right)^2+0+E\left[\epsilon^2\right] \end{align}

edited Apr 02 '16 at 11:59

answered Apr 02 '16 at 11:52

Gilles

1,022
1
10
21

what is the sample space here? Why do you consider only e to be a random variable? Shouldn't X also be considered an r.v. as we are trying to compute the average squared error between prediction and actual value of Y for all Xs in our population? What is our sample space if both X and e are random? Can you please expand your answer to address those points? – Marcus Junius Brutus Dec 08 '19 at 03:11
Moreover, as this book (I have the seventh printing) explicitly says in the introduction (where notation is introduced) that random variables are denoted with capital letters. So it would seem that X is a random variable and, therefore, so is f(X) and f^(X). As such the E[...] cannot be removed. Also, by the same notation, e should be E. Am I missing something? – Marcus Junius Brutus Dec 08 '19 at 04:04
@Gilles Just a question about the notation. The expected value is a function so I would think the the parentheses would encapsulate the arguments for this function. It seems like E(Y -Yhat)^2 would be the square of the expected difference of Y and Yhat. E[(Y-Yhat)^2] would be the expected square difference between Y and Yhat. Are they using some other notation that I'm not familiar with? – svenhalvorson Jul 09 '20 at 17:43

Derivation of Equation of Reducible and Irreducible Error

1 Answers1

Linked