Expected Predicted Error (EPE) with L1 loss

Question

In Element of Statistical learning it is saying on page 20, equation 2.18. That using the L1 norm instead of the usual L2 norm leads to an $f(X)$ optimising the EPE being the median instead of the regression function.

I am trying to prove this fact as follow:

Considering that it still suffice to minimize the Expected predicted error pointwise for each x i.e. we have still equation 2.12 holding up from page 18:

$f(X) = argmin_cE_{Y|X}((|Y-c|) |X)$

then I try to find c that minimize the Expectation as follow:

\begin{equation} \begin{split} \frac{\partial E_{Y|X}((|Y-c|) |X)}{\partial c} \overset{!}{=} 0 \Leftrightarrow \int_Y - \frac{y - c}{| y - c |} p_{Y|X}(y|x)dy = 0 \end{split} \end{equation}

but I am stuck here as I don't see how to show that:

$$ \int_Y - \frac{y - c}{| y - c |} p_{Y|X}(y|x)dy = 0 $$

leads to $c$ being the median.

Very much related: [Why does minimizing the MAE lead to forecasting the median and not the mean?](https://stats.stackexchange.com/q/355538/1352) — Stephan Kolassa, May 18 '20 at 11:28
Thanks I saw something very similar as well but I was more looking for a formal proof rather than the intuition behind it — grll, May 18 '20 at 11:41
The paper by Hanley referenced in that thread gives pointers to a couple of proofs, e.g., in Cramér (1946), *Mathematical Methods of Statistics*. Alternatively, [Schwertman et al. (1990, *The American Statistician*)](https://www.tandfonline.com/doi/abs/10.1080/00031305.1990.10475690) give a noncalculus proof that the median minimizes the sum of absolute distances for a finite set of data points. I would expect the statement for continuous distributions to be in most books on mathematical statistics. Ane could also look at quantile regression literature, since the median is a specific quantile. — Stephan Kolassa, May 18 '20 at 11:52
@StephanKolassa Thanks a lot I will definitely look into this. — grll, May 18 '20 at 11:55

grll · Answer 1 · 2020-05-18T11:42:12.067

It was actually not that complicated. I found a solution for myself and did as follow:

\begin{align} & \int_Y \frac{y - c}{| y - c |} p_{Y|X}(y|x)dy = 0 \\ &\Leftrightarrow \int_{min_y}^{c}-p_{Y|X}(y|x)dy + \int_{c}^{max_y}p_{Y|X}(y|x)dy = 0\\ &\Leftrightarrow \int_{min_y}^{c}p_{Y|X}(y|x)dy = \int_{c}^{max_y}p_{Y|X}(y|x)dy \end{align}

which by definition is the conditional median as specified in Element of Statistical Learning.

Expected Predicted Error (EPE) with L1 loss

1 Answers1