1

From the connotation of "Maximum likelihood estimator" I am inclined to think that the maximum likelihood estimator of the mean of a distribution should equal the mean of the sample values drawn from that distribution. What else could the "maximum likelihood estimate" of the mean be?

Also, by calculus, the least squares estimator of the mean is again equal to the mean of the sample values of a sample drawn from the distribution. So is the MLE of the mean of a distribution always equal to the least squares estimator of the mean? If it is not, can someone give a counter-example?

kjetil b halvorsen
  • 63,378
  • 26
  • 142
  • 467
user2371765
  • 175
  • 7
  • 4
    The lognormal distribution family provides a familiar counterexample. – whuber Nov 12 '19 at 15:37
  • i can imagine fat tailed distributions (pareto?) where sample mean is definitely not a good estimate of the mean. But my calculus is pretty much nonexistent, so I don't know if that is the correct answer. – rep_ho Nov 12 '19 at 16:37
  • 5
    Consider Laplace distribution, whose MLE is sample median. – Zhanxiong Nov 12 '19 at 17:27
  • @whuber Why did you not post the following as an answer? "The lognormal distribution family provides a familiar counterexample." – user2371765 Nov 13 '19 at 08:10
  • 1
    Because Xi'an has already supplied that answer along with a clear explanation. – whuber Nov 13 '19 at 13:26
  • @whuber I understand Xi'an's answer in isolation, but not in relation to my question. To be precise, because E(exp(X)) is not equal to exp(E(X)), I get that MLE(E(exp(X))) where X is normal is not exp(sample average of a sample drawn from the normally distributed population), which, by the way, equals the geometric mean of the values from the corresponding lognormal. But my question was about whether the MLE of the mean is equal to the arithmetic mean of the values in general. The lognormal is the counter-example I wanted. – user2371765 Nov 14 '19 at 11:11
  • @whuber Xi'an's argument, effectively, is about why the MLE does not equal the geometric mean of the lognormal sample and that there is an adjustment factor of exp(sigma^2/2). But with the adjustment factor one *could* come closer to the arithmetic mean although one might not exactly end up there. I don't know whether there is any result for quantifying by how much the GM and AM of a lognormal sample differ. – user2371765 Nov 14 '19 at 11:11
  • @whuber Also, Xi'an's answer came much after your comment. Of course, the answer is useful and highlights an important point, but I thought it did not answer my question and it distracted me for quite a while. But I mean absolutely no offence in saying this, and it very well might have been the answer to my next question which I now will not ask! Thank you, Xi'an. – user2371765 Nov 14 '19 at 11:12
  • @whuber In other words, even if the mean were invariant under transformation (similar to how MLE is), the MLE of the mean of the lognormal distribution would still equal only the geometric mean of the lognormal sample (and not the arithmetic mean). And, since the mean is not invariant under transformation, there happens to be an upward adjustment to the GM of the sample which might bring it in the vicinity of the AM. – user2371765 Nov 14 '19 at 11:41
  • Just to be perfectly clear, the MLE of the lognormal mean is not the GM of the sample. The lognormal mean is $\exp(\mu + \sigma^2/2).$ The MLE of $(\mu,\sigma^2)$ is obtained in the usual way from the mean $\bar y$ and (uncorrected) variance $s_y^2$ of the *logarithms* of the sample. Therefore the MLE of the lognormal mean is obtained by transforming those estimates, yielding $\exp(\bar y + s_y^2/2).$ – whuber Nov 14 '19 at 14:41
  • @whuber I wrote "To be precise, because E(exp(X)) is not equal to exp(E(X)), I get that MLE(E(exp(X))) where X is normal is not exp(sample average of a sample drawn from the normally distributed population), which, by the way, equals the geometric mean of the values from the corresponding lognormal. " I did not say the MLE of the lognormal mean is the GM. I said exp(sample average of a sample drawn from the normally distributed population) equals the geometric mean of the values from the corresponding lognormal. My sentence was ambiguous. Sorry. Now is what I write clearer? – user2371765 Nov 15 '19 at 12:48
  • @whuber Also, I wrote later "In other words, even if the mean were invariant under transformation (similar to how MLE is), the MLE of the mean of the lognormal distribution would still equal only the geometric mean of the lognormal sample (and not the arithmetic mean). " in response to Xi'an's assertion that because the mean is not invariant under transformation MLE is not equal to sample average in general. I know the MLE of the lognormal mean is not the GM of the sample. But it would be the GM if the mean were invariant under transformation. – user2371765 Nov 15 '19 at 12:48

3 Answers3

5

A generic contradiction to your intuition is that the MLE is invariant by transformations, while the mean is not. In particular, in exponential families, the MLE is the empirical mean of the natural statistics, but not of other transforms of the sample. For instance, in a Normal $X\sim \mathcal N(\theta,1)$ sample, the MLE of $\theta$, mean of $X$, is $X$, but the MLE of the mean of $\exp(X)$, $\exp\{\theta+1/2\}$, is $\exp\{X+1/2\}$ and not $\exp\{X\}$.

See also the connected discussion on when is the MLE a biased estimator of the mean.

Xi'an
  • 90,397
  • 9
  • 157
  • 575
  • Can you please see my comments addressed to whuber right below the question? I think the reason for the MLE not being the sample average for lognormal is that the algebra for making them equal is messed up, and not that the mean is not invariant under transformations. Even if the mean were invariant under transformation, the MLE for the lognormal mean would still be only the GM of the sample and not the AM..In any case, thank you for your answer. – user2371765 Nov 14 '19 at 12:32
1

Following @ZhanXiong's Comment. Suppose we look at $n = 10^5$ samples of size $n = 5$ from a Laplace (double exponential) population centered at $10.$ That is, population mean and median are both 10.

The following simulation in R, illustrates that the sample means $\bar X = A$ and $\tilde X = H$ have $E(A) = E(H) = 10,$ so that both the sample mean and median are unbiased estimators of the center. However, the sample means have a larger standard deviation than the sample medians. Thus, according to one frequently-used criterion, the sample median is a "better" estimator of the center than the sample mean.

set.seed(1112)
m = 10^5;  n = 5
x = rexp(m*n)-rexp(m*n)+10
DTA = matrix(x, nrow=m)
a = rowMeans(DTA)
mean(a);  sd(a)
[1] 9.997945          # aprx E(A) = 10 
[1] 0.6317852         # aprx SD(A) = sqrt(2/5) =  0.6325

h = apply(DTA,1,median)
mean(h);  sd(h)
[1] 9.997512          # aprx E(H) = 10
[1] 0.5910876         # SD(H) < SD(A)

enter image description here

par(mfrow=c(2,1))
 hist(a, prob=T, br=40, col="skyblue2", xlim=c(6,15), 
      main="Aprx Dist'n of Sample Meane")
 hist(h, prob=T, br=40, col="skyblue2", xlim=c(6,15), 
      main="Aprx Dist'n of Sample Medians")
par(mfrow=c(1,1))
BruceET
  • 47,896
  • 2
  • 28
  • 76
0

maximum likelihood estimator of the mean of a distribution

I don't think I've ever seen the mean be computed via MLE. Remember, MLE is about parameters, not moments of the distribution.

For a lot of distributions, the parameters just happen to be best estimated by the sample mean (see $\mu$ for the normal, $\lambda$ for the poisson), but this isn't always the case (see $\lambda$ for the exponential, but this depends on the parameterization).

Demetri Pananos
  • 24,380
  • 1
  • 36
  • 94
  • 4
    The mean can always be considered a parameter. What might illuminate your point would be an example of a family of distributions in which all have the same mean. A Normal$(0,\sigma^2)$ family would work fine for that purpose, for instance. – whuber Nov 12 '19 at 15:44
  • 1
    How will this example illuminate Demetri's point? – user2371765 Nov 12 '19 at 15:51
  • 2
    In this example, the MLE of the mean is $0,$ which will almost surely not equal the sample mean. – whuber Nov 12 '19 at 16:18
  • Not necessarily. Since the family is centered around 0, one is as likely to get a negative value as a positive one in the sample. – user2371765 Nov 12 '19 at 17:21
  • 1
    @Cagdas That's correct: but in the example of the Normal$(0,\sigma^2)$ family, the MLE of the mean is $0.$ Nothing in the theory of estimation guarantees that estimates must vary. – whuber Nov 12 '19 at 19:36
  • @user17144 But what does that demonstrate? Your observation, although correct, appears to have no bearing on any of the issues in this thread. – whuber Nov 12 '19 at 19:36
  • 1
    @Cagdas Because there is no element in the Normal$(0,\sigma^2)$ family with a mean other than $0,$ *any* legitimate estimator of the mean must lie in the set $\{0\}.$ Need a proof? Given an MLE $\hat\sigma$ of $\sigma,$ it is trivial to check that $(\hat\mu,\hat\sigma)=(0,\hat\sigma)$ is a maximizer of the likelihood, *QED.* Another proof: it is well-known that the MLE of a parameter $\theta=f(\sigma)$ (where $f$ is a function and $\sigma$ a parameter) is $\hat\theta=f(\hat\sigma).$ Consider $f(x)=x-x$ and note that $\mu=f(\sigma)$ in this family: you obtain $\hat\mu=\hat\sigma-\hat\sigma=0.$ – whuber Nov 12 '19 at 19:46
  • @Cagdas This parameter space is not a "single point," because $\sigma^2$ lies in the interval $(0,\infty),$ which is diffeomorphic to the real line. If it helps, compare this to the problem of estimating the skewness (standardized central third moment) in a Normal$(\mu,\sigma^2)$ family: the MLE of the skewness is $0.$ – whuber Nov 12 '19 at 19:57
  • @Cagdas In "space for the mean only" you are making an idiosyncratic distinction that is not present in the theory of MLE. But I agree that the situation is trivial: its triviality is the reason I proposed it as an illustrative example. – whuber Nov 12 '19 at 20:01
  • 1
    @Cagdas The skewness is a well-defined property of every distribution in the family of Normal distributions. *Ergo,* any useful estimation procedure should be able to estimate it. The "$0$" in the statement "the MLE is $0$" therefore needs to be understood as the function that assigns the value $0$ to each *iid* sample. As such it is an estimator (in the usual sense of that word). As with any estimator, it is applied in circumstances where the element of the hypothesized distribution family is not known. – whuber Nov 12 '19 at 20:11
  • @whuber I think now I follow you. Basically you are saying when the correct specification is assumed to be normal then MLE of skewness is 0 by definition. Did I get it right? – Cagdas Ozgenc Nov 12 '19 at 20:14
  • 1
    @Cagdas Perhaps. It's not always easy to reason about these "trivial" or "edge" cases, but doing so sometimes provides insight. I think it comes down to what we understand by the phrase "by definition." If you mean "by applying the definition of the MLE we will conclude $0$ is the MLE of the skewness," then I think you have understood my intention. I also think it's possible to conceive of this situation in different valid ways, which might lead to superficially different explanations and understandings of my claim, but (I hope) will not result in any real misunderstandings. – whuber Nov 12 '19 at 20:20
  • @whuber In that case maybe you can tell me how one calculates a non-parameter using MLE? I mean basically we can talk of any utility integrated over the probability measure (I hope I am using the terminology right). All of these then can be calculated by MLE, but how? – Cagdas Ozgenc Nov 12 '19 at 20:24
  • 2
    @Cagdas Many authors conceive of a "parameter" as a (reasonably nicely behaved) function $\theta:\Omega\to\mathbb{R}$ where $\Omega$ is the set of possible probability distributions (such as the Normal ones). In this sense not only are $\mu$ and $\sigma^2$ parameters of the Normal$(\mu,\sigma^2)$ family, but so are $\sigma$ (the SD), $1/\sigma^2$ (the precision), the skewness, and so on. Typically a parameter is estimated directly, but in MLE it suffices to estimate *any* set of numbers that identify one distribution $\hat F\in\Omega$ uniquely, for then $\hat\theta=\theta(\hat F).$ – whuber Nov 12 '19 at 20:28
  • @whuber skewness will not identify a normal distribution uniquely. but according to previous conclusion it has MLE 0. I can define many other strange integrals similar to popular moments. do they also have MLEs? – Cagdas Ozgenc Nov 12 '19 at 20:34
  • @Cagdas Yes, they do have MLEs. You provide a very nice example with that remark: perhaps one of your strange integrals, unbeknownst to you (or to anyone else) always yields zero. Does that rule out attempting to estimate it from data? Not at all: observe the data, obtain the MLEs of $\mu$ and $\sigma^2,$ and compute the integral for those MLEs, just as you always would. – whuber Nov 12 '19 at 20:38
  • @whuber so basically if I have MLE of $\mu$ and $\sigma^2$ at hand and if I compute skewness using $\hat{\mu}$ and $\hat{\sigma^2}$ you are saying that they should cancel out and I should end up with 0. Right? – Cagdas Ozgenc Nov 12 '19 at 20:42
  • 1
    @Cagdas Yes, understanding you will do a computation that is suitable for the family of distributions you have posited. For instance, in some families (*e.g.*, Gamma$(a,b)$ distributions) the skewness does vary with the parameters. In all finite-dimensional families with defined, finite moments, eventually all the higher moments depend in determinate ways on the lower moments. The dependency is a property of the entire family. In the Normal family, for instance, all *cumulants* higher than the second are zero: this translates into formulas for all higher moments in terms of $\mu,\sigma.$ – whuber Nov 12 '19 at 20:46
  • 2
    @whuber I understand. I have learned something new today. Thank you. – Cagdas Ozgenc Nov 12 '19 at 20:48
  • @whuber I mis-interpreted your comment. Yes, the family you suggest is a counter-example. – user2371765 Nov 13 '19 at 07:00