Why is median the solution to absolute difference, but mean is the solution to squared difference?

Question

What is the value of $a$ and $b$ that minimizes the $L_1$ and $L_2$ norm, respectively? $$ \min_{a} \mathrm{E} \left| X - a \right| $$ I was told that median is the solution. How do you solve it? Why can't it be the mean instead?

$$\mathrm{E} (X - b) ^ 2$$ I was told that mean $\mu$ is the solution to the squared error. How do you solve it?

score 2 · Answer 1 · answered Oct 05 '17 at 01:52

2

Loss: $$L_1(\hat y,y)=|\hat y-y|$$

Suppose that $$\hat y=y+\varepsilon,$$ where the probability density of error $\varepsilon$ is $f(\varepsilon)$.

Now, we can solve the problem $$\min_{\hat y} E[L_1(\hat y,y)]$$ using first order condition (FOC): $$\partial/\partial\hat y E[L_1(\hat y,y)]=0$$ $$\partial/\partial\hat y E[L_1(\hat y,y)]= E[\partial/\partial\hat y |\hat y-y|]= \int_{-\infty}^{\hat y-y}f(e)de-\int_{\hat y-y}^{\infty}f(e)de =F(\hat y-y)-(1-F(\hat y-y))=2F(\hat y-y)-1=0$$ Where $F(.)$ is the cumulative distribution function. Also I substituted $\hat y-y$ with $e$ in the integrals.

You can see that the FOC is satisfied when $F(\hat y-y)=1/2$, i.e. when the $\hat y$ is at the median of the distribution.

You can do the same for $L_2$ to show that it requires the mean. Loss: $$L_2(\hat y,y)=(\hat y-y)^2$$

Suppose that $$\hat y=y+\varepsilon,$$ where the probability density of error $\varepsilon$ is $f(\varepsilon)$.

Now, we can solve the problem $$\min_{\hat y} E[L_2(\hat y,y)]$$ using first order condition (FOC): $$\partial/\partial\hat y E[L_2(\hat y,y)]=0$$ $$\partial/\partial\hat y E[L_2(\hat y,y)]= E[\partial/\partial\hat y (\hat y-y)^2]= E[2 (\hat y-y)]= 2 (\hat y-E[y])]=0$$

You can see that the FOC is satisfied when $\hat y=E[y]$, i.e. when the $\hat y$ is at the mean of the distribution.

answered Oct 05 '17 at 01:52

Aksakal

55,939
5
90
176

Is there a course I can learn this technique? This is pretty cool. I’m taking a probability course using Casella and Berger textbook. But, the book example do not cover this. – user13985 Oct 05 '17 at 04:32
This type of stuff is taught in forecasting courses but all you need to know is how to setup the problem. The rest is a simple calculus. I'll update the answer with a link to a book or paper later – Aksakal Oct 05 '17 at 12:12
[Granger's paper](https://books.google.com/books?id=RskN3Db7MlgC&lpg=PA8&ots=qxxAtbERjf&dq=forecasting%20granger%20cost&pg=PA366#v=onepage&q=forecasting%20granger%20cost&f=false) is a good intro into how cost function are used in forecasting to come up with optimal forecasts, he has example for quadratic loss (cost) leading to the mean – Aksakal Oct 05 '17 at 18:37
Can you explain why the middle number minimizes the absolute difference? Why not the mean? – user13985 Oct 05 '17 at 18:57
@user13985 $F(x)=1/2$ is a definition of the median, i.e. when the probability below and above x is the same 50%. For symmetrical distributions such as Gaussian, median and mean are the same – Aksakal Oct 05 '17 at 19:18
That makes sense. But doesn't mean make more sense than median? Just wondering. Definition of median in wikipedia $P(X \le m) = \frac{1}{2} = P(X \ge m) = \int f(x)dx = \frac{1}{2}$ I need to think about it. – user13985 Oct 05 '17 at 19:56

score 1 · Answer 2 · answered Oct 05 '17 at 01:48

1

The squared error is the simpler case, and a classic argument, so I will reproduce it here.

Let $\mu = E[X]$. Then we can use an add and subtract trick

$$ E[(X - b)^2] = E[(X - \mu + \mu - b)^2] = E[(X - \mu)^2] + 2 E[(X - \mu)(\mu - b)] + 2 E[(\mu - b)^2] $$

Now in the middle term, $\mu - b$ is a constant, so it can come outside of the expectation

$$ E[(X - \mu)(\mu - b)] = (\mu - b) E[X - \mu] = (\mu - b)(E[X] - \mu) = 0 $$

So, all together

$$ E[(X - b)^2] = E[(X - \mu)^2] + 2 E[(\mu - b)^2] $$

This is clearly minimized when $b = \mu$.

The median case is more elusive, but still elementary, and also deserving of classic status. You can look here for some proofs. Both André's and Brian's answers are simple and intuitive demonstrations.

answered Oct 05 '17 at 01:48

Matthew Drury

33,314
2
101
132

Solution in link is using ordered statistics to get the median. I read it. It was very long. I was completely lost when I reached the end. – user13985 Oct 05 '17 at 17:30
Ok... That tends to happen with mathematics. You should always anticipate that you will need to read something multiple times and maybe take a walk and think about it. I think the idea there is simple and beautiful, but you are free to disagree! – Matthew Drury Oct 05 '17 at 18:47
Why does the median minimize the absolute difference? What's the intuition in simple words? – user13985 Oct 05 '17 at 19:01
Im not sure there is an intuition in simple words for either case, why do you believe a simple reason exists? – Matthew Drury Oct 05 '17 at 19:20
I read this post. They used median to explain it. https://math.stackexchange.com/questions/85448/why-does-the-median-minimize-ex-c – user13985 Oct 05 '17 at 19:54
That's only a simple or intuitive explanation if calculus is simple and intuitive to you. But, I'm glad you found one that you like! – Matthew Drury Oct 05 '17 at 19:56
Your L2 norm was explanation was pretty good, I must say! – user13985 Oct 05 '17 at 20:26

Why is median the solution to absolute difference, but mean is the solution to squared difference?

2 Answers2