I'm reading John Kruschke's Doing Bayesian Data Analysis: A Tutorial with R and BUGS, in which it says that
we define the central tendency of a distribution as the value $M$ that minimizes the long-run expected distance between it and all the other values of $x$.
So $M$ is the value that minimizes (where
- $\int p(x) D(x,M)dx$, where $p(x)$ is the probability density function, or
- $\Sigma_x p(x) D(x,M)$, where $p(x)$ is the probability mass function.
In both cases, $D$ is some notion of distance.
Then he says that
- if we choose $D(x, M) = |x-M|^2$, $M$ is the mean,
- if we choose $D(x, M) = |x-M|$, $M$ is the median, and
- if we choose $D(x, M) = \delta(x,M)$, $M$ is the mode (at least in the discrete case).
All of these make sense to me intuitively, but how do you prove it in general?
Notes:
- $\delta(x,M)$ is 1 if $x=M$ and 0 otherwise. This is not Kruschke's notation; he just says "distance defined as zero for any exact match, and one for any mismatch" and I used the $\delta$ notation because it's familiar to me and easier to fit in an equation.
- I read Mean and Median properties and it only answers part of my question--I'm still interested in the mode, and in a proof that each version of $M$ minimizes its corresponding expected "distance," with uniqueness for the mean.