central tendency as minimized variance

Question

I'm reading John Kruschke's Doing Bayesian Data Analysis: A Tutorial with R and BUGS, in which it says that

we define the central tendency of a distribution as the value $M$ that minimizes the long-run expected distance between it and all the other values of $x$.

So $M$ is the value that minimizes (where

$\int p(x) D(x,M)dx$, where $p(x)$ is the probability density function, or
$\Sigma_x p(x) D(x,M)$, where $p(x)$ is the probability mass function.

In both cases, $D$ is some notion of distance.

Then he says that

if we choose $D(x, M) = |x-M|^2$, $M$ is the mean,
if we choose $D(x, M) = |x-M|$, $M$ is the median, and
if we choose $D(x, M) = \delta(x,M)$, $M$ is the mode (at least in the discrete case).

All of these make sense to me intuitively, but how do you prove it in general?

Notes:

$\delta(x,M)$ is 1 if $x=M$ and 0 otherwise. This is not Kruschke's notation; he just says "distance defined as zero for any exact match, and one for any mismatch" and I used the $\delta$ notation because it's familiar to me and easier to fit in an equation.
I read Mean and Median properties and it only answers part of my question--I'm still interested in the mode, and in a proof that each version of $M$ minimizes its corresponding expected "distance," with uniqueness for the mean.

Your questions about the mean and median are answered at https://stats.stackexchange.com/questions/7307. The formula for the mode might make sense if you could come up with a rigorous definition of "$\delta$". I suspect it might be intended to apply only to discrete distributions. Please note that "the" value is a misnomer: it is possible, and often happens, that neither the median nor the mode is unique. — whuber, Sep 08 '17 at 15:07
I am glad you found the other thread useful. If you are uninterested in the mode, then we could indicate it's a duplicate. But if you want more discussion of the mode, then consider editing your question to emphasize that--and if you can, perhaps you could clarify what Kruschke might mean by "$\delta$" (which is a technically interesting question in its own right). — whuber, Sep 08 '17 at 15:25
Good--+1. Note that with Kruschke's definition, when $F$ is a continuous distribution with density $p$ *every* number $M$ is a mode, since for any $a\lt M$ and $b\gt M$,$$\eqalign{0 &\le &\int p(x)\delta(x,M)\mathrm{d}x\\&\le&\int_{-\infty}^a p(x)\delta(x,M)\mathrm{d}x+\int_a^b p(x)\mathrm{d}x+\int_b^\infty p(x)\delta(x,M)\mathrm{d}x\\&=&0+\int_a^b p(x)\mathrm{d}x+0\\&=&F(b)-F(a).}$$In the limit as $a\to b$, $F(b)-F(a)\to 0$. Therefore $\int p(x)\delta(x,M)\mathrm{d}x=0$. That makes the definition worthless: a different understanding of "$\delta$" is necessary. — whuber, Sep 08 '17 at 18:06
@MissMonicaE your $\delta$ seems to be something other than a distance (it's 1- what I'd think of as the distance based on the quote). Note for example that the way you have it $\delta(x,x)=1$, while normally for a distance, $d(x,x)=0$ — Glen_b, Sep 09 '17 at 04:30

central tendency as minimized variance

0 Answers0