Why is it sufficient to minimize expected prediction error (EPE) pointwise?

Question

In the elements of statistical learning in section 2.4 it is written that

it suffices to minimize EPE (Expected Prediction Error) pointwise.
where

EPE = $E_{X} E_{Y/X}([Y-f(X)]^2|X)$

Why minimizing EPE is equivalent to minimizing $E_{Y/X}([Y-f(X)]^2|X)$? I see that an answer has been posted at Expected prediction error - derivation by Matthew Drury.

Matthew said that you can minimize a sum of non-negative quantities by minimizing the summands individually. But I think that is true only when summands are independent of each other. Are we assuming that in this case.

for minimizing EPE a whole we need to minimize EPE pointwise. — Abhinav Gupta, Dec 10 '15 at 11:21
I answered this in detail here: http://stats.stackexchange.com/questions/92180/expected-prediction-error-derivation/185235#185235 (I can't vote as a duplicate, because the answer there is not upvoted) — Matthew Drury, Dec 11 '15 at 15:23
I have read this answer, I still have one doubt. I have edited the question accordingly. — Abhinav Gupta, Dec 12 '15 at 07:53
@user147798 - if you choose the function $f$ and then condition on the value of $X$ then $f(X)$ is "known". If, for a particular value of $X=x$, you can make $E[(Y-f(X))^2|X=x]$ smaller then you should redefine $f(x)$ appropriately (as per equation (2.12) in the linked question). — P.Windridge, Dec 12 '15 at 09:37
In more detail: $E[(Y-f(X))^2|X=x] = E[(Y-f(x))^2|X=x] \geq \min_c E[(Y-c)^2|X=x]$. Thus, you cannot do better than term-wise minimisation. On the other hand, for every $x$ you can *define* $f(x)=c$, where $c=c(x)$ is a minimiser in $\min_c E[(Y-c)^2|X=x]$, and this attains the term-wise minimum. N.b. You are choosing a value of $f(x)$ for every $x$, and thus the terms are "independent" in the way I think you mean. — P.Windridge, Dec 12 '15 at 09:51
@P.Windridge-the value of c depends upon how the function $f(X)$ is defined. We want an $f(X)$ such that EPE is minimized. If you minimize EPE for a partcular value of X, it may so happen that overall EPE increases. My doubt is that why there is not any other possible $f(X)$, which minimizes the EPE. — Abhinav Gupta, Dec 12 '15 at 10:51

Why is it sufficient to minimize expected prediction error (EPE) pointwise?

0 Answers0