0

In Element of Statistical learning it is saying on page 20, equation 2.18. That using the L1 norm instead of the usual L2 norm leads to an $f(X)$ optimising the EPE being the median instead of the regression function.

I am trying to prove this fact as follow:

Considering that it still suffice to minimize the Expected predicted error pointwise for each x i.e. we have still equation 2.12 holding up from page 18:

$f(X) = argmin_cE_{Y|X}((|Y-c|) |X)$

then I try to find c that minimize the Expectation as follow:

\begin{equation} \begin{split} \frac{\partial E_{Y|X}((|Y-c|) |X)}{\partial c} \overset{!}{=} 0 \Leftrightarrow \int_Y - \frac{y - c}{| y - c |} p_{Y|X}(y|x)dy = 0 \end{split} \end{equation}

but I am stuck here as I don't see how to show that:

$$ \int_Y - \frac{y - c}{| y - c |} p_{Y|X}(y|x)dy = 0 $$

leads to $c$ being the median.

grll
  • 141
  • 3
  • Very much related: [Why does minimizing the MAE lead to forecasting the median and not the mean?](https://stats.stackexchange.com/q/355538/1352) – Stephan Kolassa May 18 '20 at 11:28
  • Thanks I saw something very similar as well but I was more looking for a formal proof rather than the intuition behind it – grll May 18 '20 at 11:41
  • 1
    The paper by Hanley referenced in that thread gives pointers to a couple of proofs, e.g., in Cramér (1946), *Mathematical Methods of Statistics*. Alternatively, [Schwertman et al. (1990, *The American Statistician*)](https://www.tandfonline.com/doi/abs/10.1080/00031305.1990.10475690) give a noncalculus proof that the median minimizes the sum of absolute distances for a finite set of data points. I would expect the statement for continuous distributions to be in most books on mathematical statistics. Ane could also look at quantile regression literature, since the median is a specific quantile. – Stephan Kolassa May 18 '20 at 11:52
  • @StephanKolassa Thanks a lot I will definitely look into this. – grll May 18 '20 at 11:55

1 Answers1

0

It was actually not that complicated. I found a solution for myself and did as follow:

\begin{align} & \int_Y \frac{y - c}{| y - c |} p_{Y|X}(y|x)dy = 0 \\ &\Leftrightarrow \int_{min_y}^{c}-p_{Y|X}(y|x)dy + \int_{c}^{max_y}p_{Y|X}(y|x)dy = 0\\ &\Leftrightarrow \int_{min_y}^{c}p_{Y|X}(y|x)dy = \int_{c}^{max_y}p_{Y|X}(y|x)dy \end{align}

which by definition is the conditional median as specified in Element of Statistical Learning.

grll
  • 141
  • 3