0

I am a bit confused by the several sources telling that:

  1. the first raw moment is by definition the arithmetic mean
  2. expected value of any random variable is exactly the arithmetic mean

The expected value and the arithmetic mean are the exact same thing. source: How does the expected value relate to mean, median, etc. in a non-normal distribution?

  1. at the same time, on Wikipedia, I can see that for a log-normal distribution, the "mean" (not specified which one), which is a link redirecting to "expected value" is different: exp{μ+σ^2/2} than the arithmetic mean

I do not know, how the μ is expressed? In the normal distribution it is the arithmetic mean, but here?

  1. Then I read about the log-normal distribution, where the geometric mean is the exp(arithmetic_mean(log(normally_distributed_data))). Is this the expected value for log-normal? Is this the first raw moment? Is this something different?

  2. If the arithmetic mean is the first raw moment of any distribution (that has it defined) and is also the expected value (but look at point 3), then what is the point in using geometric mean for the log-normal distribution?

  3. What about the location parameter? Is this the first moment? The mean? Arithmetic mean? Geometric mean? Median? Expected value? Or maybe it can be characterized by them in certain distributions?

Please, help me to organize my knowledge on the relationship between these terms.

guazalli
  • 3
  • 2
  • Take a look at [this question](https://stats.stackexchange.com/questions/115286/barx-versus-mathbbe-barx) and see if it clears anything up for you. – knrumsey Jun 18 '19 at 17:26
  • Thank you for your answer, but it does not clarify it. I was told the first raw moment is always the arithmetic mean and the expected value, while, for, say, the log-normal, Poisson, Weibull, and all others, the Wikipedia shows a totally different formula for the expected value, which does not correspond with this claim. – guazalli Jun 18 '19 at 18:36

1 Answers1

0

The source of your confusion seems to stem from the difference between a statistic and a parameter. Consider a continuous random variable $X$ with density function $f(x)$. The expected value (or equivalently the raw population moment) is $$\mu = E(X) = \int_{\mathbb R}xf(x)dx.$$

On the other hand, the raw sample moment (or equivalently the arithmetic mean) is $$\bar{x}_n = \frac{1}{n}\sum_{i=1}^n x_i.$$

These are clearly two different concepts. The first one is a parameter of the distribution of $X$ and the second is a function of the data and thus a statistic. Of course they are related to each other via the Law of Large Numbers, which essentially states that the sample mean (i.e. arithmetic mean or raw first sample moment) converges in probability to the expected value. $$\bar x_n \stackrel{P}{\rightarrow} \mu$$ as $n\rightarrow \infty$.


For instance, if $X \sim \text{Lognormal}(\mu, \sigma)$ then the expected value/first population moment is $$E(X) = e^{\mu +\sigma^2/2}$$ and the sample mean/arithmetic mean/first sample moment $\bar{x}$ converges in probability to this quantity as $n\rightarrow \infty$.


Edit to address question in the comment: The population geometric mean of a positive random variable is $\exp E(\log(X))$. This is generally not the same as $E(X)$ due to Jensen's inequality. I have never seen this used as an estimator for the mean (i.e. expected value) of a RV. It does come up however... the MLE of $\mu$ in a lognormal distribution is $$\hat\mu = \frac{1}{n}\sum \log X_i.$$ This is precisely the logarithm of the geometric mean. Equivalently, the MLE for $e^\mu$ in this example is the geometric mean.

Similarly, the median is not used (generally speaking) as an estimate for the expected value. An interesting exception however, is the double exponential (or Laplace) distribution. Since the distribution is symmetric, the expected value and population median are both equal to $\mu$. The Maximum Likelihood estimator for this parameter is the sample median and (I believe) has smaller MSE for $\mu$ than the sample mean.

knrumsey
  • 5,943
  • 17
  • 40
  • Thank you very much for this clarification. This is what I was looking for. So my last question is: is the geometric mean (not a moment) also an estimator of the expected value, but "better"? I know that the arithmetic mean is BLUE, so the best. The geometric mean cannot be "better than best", so it must be "worse"... But, from the other side, it is about "asymmetric" distribution, "multiplicative", so maybe it's not the best unbiased linear estimator, but rather best unbiased non-linear one? So I believe also median and any other kind of mean (and mode) is an estimator of t he E(X) too? – guazalli Jun 18 '19 at 22:52
  • Thank you very much. Your update is detailed and very helpful. Let me explain why I asked for this. I saw numerous places, where geometric mean was used to summarize the right skewed distribution, which was confirmed to come from a log-normal distribution. I also saw articles saying why it is better for skewed data, especially if we know it is of multiplicative nature. Similarly - the harmonic mean. So I wanted to relate this measure to the more general terms, like expected value, estimator of central tendency. So I asked myself "what is the geometric mean" (and SD)? Is the GM the expected – guazalli Jun 19 '19 at 08:08
  • value for the log-normal distribution? But the expected value is the first raw moment, expressed as the arithmetic mean. Then I found a term "moment generating function", which shows how the moments are calculated for each distribution having them. So was double confused - why is it necessary, if the first moment, for example, IS the arithmetic mean, actually? How can be the geometric mean called "the expected value" for the log-normal in Wikipedia (please see where the "Mean" link navigates). Am I right the geometric mean is the best sample estimator of the pop. expected value for log-normal? – guazalli Jun 19 '19 at 08:13
  • Not quite. The geometric mean is an estimator of the quantity $e^\mu$, which is the population geometric mean. The geometric mean is not a good estimator for the expected value, especially when $\sigma$ is large since the bias is $e^\mu(1-e^{sigma^2/2})$. – knrumsey Jun 19 '19 at 22:39
  • Thank you very much, knrumsey. So, actually, does the geometric mean have any advantage? I saw many articles on "why we should use the geometric mean on right skewed distributions", especially when the data are log-normal, multiplicative, when it's about concentrations, especially because it is equal to the median here. Other people write a lot that the median is much better than the arithmetic mean in right skewed data, which also justifies the geometric mean. But if the statistical properties of the geometric mean are so poor, should we even care about anything else than the arithmetic mean? – guazalli Jun 21 '19 at 12:05
  • @guazalli, the geometric mean may have some advantages as a measure of center. Just don't forget that it measures something different from the expected value. This comment thread is getting quite long, you might consider posting this as a new question. – knrumsey Jun 21 '19 at 19:09