3

With reference to Expected Prediction Error derivation - page 18, section 2.4 in Elements of Statistical Learning. Please refer text below:

Please Refer below:

I have been able to follow up to step 2.11. I am struggling to understand step 2.12 and 2.13.

My understanding:

  • We intend to minimize EPE, as EX is constant, we focus on minimizing EY|X([Y-f(X)]2|X).
  • Doubt in step 2.12: To minimize EPE, the difference between Y i.e. actual value and f(X) i.e. predicted value should be minimized. However, equation 2.12 is minimizing "c".
  • Guide me on understanding this - my understanding is that with small c, [Y-c]2 will become larger.
  • Additionally, I could not figure out development of 2.13 from 2.12.

Please correct me wherever my assumptions and/ or understanding is incorrect.

P.S.: I studied Probability (Tsitsiklis) and Linear Algebra (David C. Lay) before moving to ESL.

Andry
  • 147
  • 1
  • 7
Santo
  • 31
  • 4
  • Please make an effort to make your post self-contained. – Christoph Hanck Jun 20 '17 at 09:37
  • added screen-shot for reference - as suggested by Christoph Hanck – Santo Jun 20 '17 at 10:01
  • See https://stats.stackexchange.com/questions/262837/elements-of-statistical-learning-expected-prediction-error-derivation – Christoph Hanck Jun 20 '17 at 14:08
  • And https://stats.stackexchange.com/questions/92180/expected-prediction-error-derivation/102662#102662 – Christoph Hanck Jun 20 '17 at 14:08
  • Danke! presumably got it, from above links and https://stats.stackexchange.com/questions/92180/expected-prediction-error-derivation/102662#102662 – Santo Jun 20 '17 at 16:41
  • (2.12) is not "minimizing $c$" – Juho Kokkala Jun 21 '17 at 16:58
  • Christoph/ Juho and Members, my final understanding is that argmin c in 2.12 (ref. above) means: "determine c so that [Y-c]^2 is minimized." Further, I'm still not sure about 2.13; I assumed that E([Y-c]^2 | X = x) - in 2.12 - is a function of Y, and so, replaced by E(Y|X=x) in 2.13. Please correct my understanding wherever faulty - request consideration; opening books after about 2 decades. – Santo Jun 22 '17 at 03:25

1 Answers1

2

Let $H$ be any set of functions of $x$. Then, for each $h\in H$, $\int h(x)\,dx \ge \int \inf_{g\in H} g(x)\,dx$. Sometimes, as in the current situation, the function $\lambda$, given by $\lambda(x)=\inf_{g\in H}g(x)$, is already in $H$, in which case the least value of the integral of $h$, as $h$ varies over $H$, is given by taking $h=\lambda$. We will define $H$ in the current situation, then compute $\lambda$ and then check that $\lambda\in H$.

Here $H$ is the set of all functions $h$ given by $h(x) = E_{Y|X}((Y-f(x))^2 | X=x)$, where $f$ is some measurable function of $x$. Measurability imposes no restriction on possible values of $c=f(x)$, so $$ \lambda(x) = \inf_{h\in H} h(x) = \inf_c E_{Y|X}((Y-c)^2|X=x),$$ which is the minimum of a quadratic function of $c$. We write this quadratic function as $A-2B.c +c^2$, where $A$ and $B$ are independent of $c$. This has its minimum when $c=B=E_{Y|X}(Y|X=x)$. So we certainly cannot do any better than defining $f(x)=c=E_{Y|X}(Y|X=x)$. Since $f$ is a measurable function, we do have $\lambda\in H$, and this choice of $f$ achieves the required minimum in 2.11.

David Epstein
  • 1,077
  • 2
  • 8
  • 18