Elements of Statistical Learning - Statistical Decision Theory : Doubt regarding Minimization of EPE

Question

With reference to Expected Prediction Error derivation - page 18, section 2.4 in Elements of Statistical Learning. Please refer text below:

I have been able to follow up to step 2.11. I am struggling to understand step 2.12 and 2.13.

My understanding:

We intend to minimize EPE, as E_X is constant, we focus on minimizing E_Y|X([Y-f(X)]²|X).
Doubt in step 2.12: To minimize EPE, the difference between Y i.e. actual value and f(X) i.e. predicted value should be minimized. However, equation 2.12 is minimizing "c".
Guide me on understanding this - my understanding is that with small c, [Y-c]² will become larger.
Additionally, I could not figure out development of 2.13 from 2.12.

Please correct me wherever my assumptions and/ or understanding is incorrect.

P.S.: I studied Probability (Tsitsiklis) and Linear Algebra (David C. Lay) before moving to ESL.

added screen-shot for reference - as suggested by Christoph Hanck — Santo, Jun 20 '17 at 10:01
See https://stats.stackexchange.com/questions/262837/elements-of-statistical-learning-expected-prediction-error-derivation — Christoph Hanck, Jun 20 '17 at 14:08
And https://stats.stackexchange.com/questions/92180/expected-prediction-error-derivation/102662#102662 — Christoph Hanck, Jun 20 '17 at 14:08
Danke! presumably got it, from above links and https://stats.stackexchange.com/questions/92180/expected-prediction-error-derivation/102662#102662 — Santo, Jun 20 '17 at 16:41
Christoph/ Juho and Members, my final understanding is that argmin c in 2.12 (ref. above) means: "determine c so that [Y-c]^2 is minimized." Further, I'm still not sure about 2.13; I assumed that E([Y-c]^2 | X = x) - in 2.12 - is a function of Y, and so, replaced by E(Y|X=x) in 2.13. Please correct my understanding wherever faulty - request consideration; opening books after about 2 decades. — Santo, Jun 22 '17 at 03:25

David Epstein · Answer 1 · 2017-07-02T11:29:16.797

Let $H$ be any set of functions of $x$. Then, for each $h\in H$, $\int h(x)\,dx \ge \int \inf_{g\in H} g(x)\,dx$. Sometimes, as in the current situation, the function $\lambda$, given by $\lambda(x)=\inf_{g\in H}g(x)$, is already in $H$, in which case the least value of the integral of $h$, as $h$ varies over $H$, is given by taking $h=\lambda$. We will define $H$ in the current situation, then compute $\lambda$ and then check that $\lambda\in H$.

Here $H$ is the set of all functions $h$ given by $h(x) = E_{Y|X}((Y-f(x))^2 | X=x)$, where $f$ is some measurable function of $x$. Measurability imposes no restriction on possible values of $c=f(x)$, so $$ \lambda(x) = \inf_{h\in H} h(x) = \inf_c E_{Y|X}((Y-c)^2|X=x),$$ which is the minimum of a quadratic function of $c$. We write this quadratic function as $A-2B.c +c^2$, where $A$ and $B$ are independent of $c$. This has its minimum when $c=B=E_{Y|X}(Y|X=x)$. So we certainly cannot do any better than defining $f(x)=c=E_{Y|X}(Y|X=x)$. Since $f$ is a measurable function, we do have $\lambda\in H$, and this choice of $f$ achieves the required minimum in 2.11.

(+1) Totally unrelated but I have to ask based on your location: Are you "*really, really good*" in Geometry? — usεr11852, Jul 02 '17 at 00:26

Elements of Statistical Learning - Statistical Decision Theory : Doubt regarding Minimization of EPE

1 Answers1