11

Suppose that $X$ is a random variable with PDF $f(x)$ with support $(-\infty,\infty)$. Suppose that the expectation of $X$ is $\mathbb{E}(X)=\lambda$.

Is it always true that

$$\int_{-\infty}^{\lambda} f(x)dx = \frac{1}{2}$$ $$\int_\lambda^{\infty}f(x)dx = \frac{1}{2}$$

  • 2
    See https://stats.stackexchange.com/questions/3787 for better information about how medians and means might be related in general. There are other good related threads, like https://stats.stackexchange.com/questions/125084/, https://stats.stackexchange.com/questions/251701/, https://stats.stackexchange.com/questions/251134, and many more that can be found by searching our site for [mean median](https://stats.stackexchange.com/search?q=mean+median). – whuber Feb 01 '21 at 19:39
  • 5
    "Symmetric about the mean" to me implies that the probability distribution is the same *shape* on either side of the mean (i.e. a mirror image about the mean), like a normal distribution or a uniform distribution. This is really asking if the cumulative probability is *equal* on either side of the mean. – Nuclear Hoagie Feb 02 '21 at 16:00
  • 3
    The mean and the median can be equal in an asymmetric distribution too. So in that sense too symmetry or asymmetry has nothing to do with the equations. – Nick Cox Feb 02 '21 at 16:28
  • @NuclearHoagie If the probability distribution is the same shape on either side (scale included), doesn't that imply that the probabilities are equal? – EssentialAnonymity Feb 02 '21 at 16:48
  • Ah, I see. The shapes could be non-symmetric but still have equal area. – EssentialAnonymity Feb 02 '21 at 16:57
  • 2
    @StrugglingStudent42 Yes, if the distribution is symmetric about the mean, the cumulative probability must be 50% on either side of the mean. But the reverse is not necessarily true, as you could have a cumulative probability of 50% on either side of the mean without having distribution that's symmetric about the mean. – Nuclear Hoagie Feb 02 '21 at 17:03
  • I'm no data scientist but I use means and medians for practical purposes and, at least in my work they are never the same. The difference between the mean and median is often useful in itself. – JimmyJames Feb 02 '21 at 22:48
  • 1
    @JimmyJames The empirical mean and median may very well (okay...will) differ, even for a symmetric population. – Dave Feb 02 '21 at 22:58
  • 1
    A measure of skewness that deserves to be used more is (mean $-$ median) / SD; it goes back to Karl Pearson. You're smart if you can see this immediately, but a nice property is that it is bounded and always falls in $[-1, 1]$. Of course, this very thread implies that getting (nearly) $0$ as a result is not a guarantee of (near) symmetry of distribution. – Nick Cox Feb 03 '21 at 17:43
  • 1
    @NickCox Something to do with Chebyshev's inequality? – Dave Feb 03 '21 at 17:45
  • 1
    See Mallows, C.L. 1991. Another comment on O'Cinneide. _The American Statistician_ 45: 258 for a proof using Jensen's inequaliry. – Nick Cox Feb 03 '21 at 18:21

4 Answers4

18

No.

In your integral equations, $\lambda$ is the median, not the mean. It may be the case that the median and mean are equal (such as a normal distribution), but they do not have to be.

As a counterexample, consider $X\sim exp(1)$.

Dave
  • 28,473
  • 4
  • 52
  • 104
  • Your counterexample is not defined on $(-\infty, \infty)$ ;) – Dave Feb 02 '21 at 22:28
  • @Dave It illustrates the point. Most of the names distributions that have support on the entire real line are symmetric. Feel free to convolve that exponential distribution with a standard normal. You can eyeball the Wikipedia graphs to see that the median is not equal to the mean: https://en.m.wikipedia.org/wiki/Exponentially_modified_Gaussian_distribution. – Dave Feb 03 '21 at 04:36
  • 5
    @Dave It is, with density equal to zero on $(-\infty, 0)$. – Marc Vaisband Feb 03 '21 at 09:58
8

I'm not surprised that you struggle with the proof, because this does not hold.

As a simple counterexample (with support that is really the whole real line), consider a mixture of two normals with different means and unequal weights. For instance, $0.25\times N(0,0.1)+0.75\times N(1,0.1)$ has a mean of $0.75$, but:

> library(EnvStats)
> pnormMix(q=0.75,mean1=0,sd1=.1,mean2=1,sd2=.1,p.mix=0.25)
[1] 0.7515524
Stephan Kolassa
  • 95,027
  • 13
  • 197
  • 357
4

It is not true since, as others have said, the median does not have to be equal to the mean.

What is true with $\mathbb E[X]=\lambda$, if you use the cumulative distribution function $F(x)$, is

$$\int_{-\infty}^{\lambda} F(x)\,dx = \int_\lambda^{\infty}(1-F(x))\,dx $$ so with the density function $$\int_{x=-\infty}^{\lambda}\int_{y=-\infty}^{x} f(y)\,dy\,dx = \int_{x=\lambda}^{\infty}\int_{y=x}^{\infty} f(y)\,dy\,dx $$

Henry
  • 30,848
  • 1
  • 63
  • 107
1

No, but that happens in some cases for instance in the Gaussian.

In fact, you have defined the median: the data point for which half of the population (of the dataset) is higher compared to this value (and therefore the other half being lower to that same value).

You have also pointed to a nice qualitative property of probability distributions:

Consider you have positive numbers drawn from a "heavy tailed" distribution, like wealth in a population of individuals. The more you have inequalities (that is, a high population of poor people and some very wealthy outliers), then the lower this median will be compared to the mean. This defines a shape parameter which is qualitatively important to describe probability distribution functions.

meduz
  • 552
  • 2
  • 9