1

In the elements of statistical learning in section 2.4 it is written that

it suffices to minimize EPE (Expected Prediction Error) pointwise.
where          

EPE = $E_{X} E_{Y/X}([Y-f(X)]^2|X)$

Why minimizing EPE is equivalent to minimizing $E_{Y/X}([Y-f(X)]^2|X)$? I see that an answer has been posted at Expected prediction error - derivation by Matthew Drury.

Matthew said that you can minimize a sum of non-negative quantities by minimizing the summands individually. But I think that is true only when summands are independent of each other. Are we assuming that in this case.

Abhinav Gupta
  • 1,511
  • 8
  • 23
  • Sufficient for what goal? – Juho Kokkala Dec 10 '15 at 11:19
  • for minimizing EPE a whole we need to minimize EPE pointwise. – Abhinav Gupta Dec 10 '15 at 11:21
  • That comment is the answer! – whuber Dec 11 '15 at 14:48
  • 4
    I answered this in detail here: http://stats.stackexchange.com/questions/92180/expected-prediction-error-derivation/185235#185235 (I can't vote as a duplicate, because the answer there is not upvoted) – Matthew Drury Dec 11 '15 at 15:23
  • I have read this answer, I still have one doubt. I have edited the question accordingly. – Abhinav Gupta Dec 12 '15 at 07:53
  • @user147798 - if you choose the function $f$ and then condition on the value of $X$ then $f(X)$ is "known". If, for a particular value of $X=x$, you can make $E[(Y-f(X))^2|X=x]$ smaller then you should redefine $f(x)$ appropriately (as per equation (2.12) in the linked question). – P.Windridge Dec 12 '15 at 09:37
  • In more detail: $E[(Y-f(X))^2|X=x] = E[(Y-f(x))^2|X=x] \geq \min_c E[(Y-c)^2|X=x]$. Thus, you cannot do better than term-wise minimisation. On the other hand, for every $x$ you can *define* $f(x)=c$, where $c=c(x)$ is a minimiser in $\min_c E[(Y-c)^2|X=x]$, and this attains the term-wise minimum. N.b. You are choosing a value of $f(x)$ for every $x$, and thus the terms are "independent" in the way I think you mean. – P.Windridge Dec 12 '15 at 09:51
  • @P.Windridge-the value of c depends upon how the function $f(X)$ is defined. We want an $f(X)$ such that EPE is minimized. If you minimize EPE for a partcular value of X, it may so happen that overall EPE increases. My doubt is that why there is not any other possible $f(X)$, which minimizes the EPE. – Abhinav Gupta Dec 12 '15 at 10:51

0 Answers0