This section of Introduction to Statistical Learning in R (page 19 in v6, statement 2.3) is motivating the difference between reducible and irreducible error (that is noted by $\epsilon$ and has mean zero).
Consider a given estimate $\hat{f}$ and a set of predictors $X$, which yields the prediction $\hat{Y} = \hat{f}(X)$. Assume for a moment that both $\hat{f}$ and $\hat{X}$ are fixed. Then, it is easy to show that
$E(Y - \hat{Y})^2 = E[f(X) + \epsilon - \hat{f}(X)]^2$
$= [f(X) - \hat{f}(X)]^2 + Var(\epsilon)$
I'm having trouble getting the intermediate steps here. I understand that the expectation of the error in prediction should involve the variance of $\epsilon$ but I'd like to understand the proof.
I expanded to
$E(Y - \hat{Y})^2 = E[f(X) - \hat{f}(X)]^2 + 2E(\epsilon)E[f(X) - \hat{f}(X)] + E(\epsilon)^2$
and I see that I have $E(\epsilon)$ and $E(\epsilon)^2$ terms that could lead to $Var(\epsilon)$, but I'm stuck trying to fit some of the basic expected value and variance manipulations to it.