8

Given i.i.d. draws $x_1,...,x_n$ from $X$, where:

  • $X$ has a finite mean $E[X]=\mu < \infty$,
  • $X$ is symmetric about its mean, meaning $f_X(\mu+c)=f_X(\mu-c)$ for all $c$,
  • The probability density function $f_X$ is not otherwise known.

Is it possible to prove the following?

Proposition. The MLE for the mean of $X$ is the sample mean, $\hat \mu_{MLE}=\bar x = \sum_{i=1}^n x_i$.

A proof or a counterexample would be great. I am willing to additionally assume that $X$ has a finite variance $Var[X]=\sigma^2 < \infty$, or other common basic assumptions, if that becomes necessary for the proposition to hold, or if it greatly simplifies the proof.

I suspect that it may be possible to use the invariance of the MLE to transformations of the data to prove this, but it might follow from simpler facts about the sample mean.

Johan
  • 135
  • 4
  • 2
    A [question posted just a few hours ago](http://stats.stackexchange.com/questions/98971/mle-of-a-cauchy-distribution) provides a counterexample (well, it comes close enough: it concerns a distribution without a mean, but it's easily modified to give a counterexample with a mean). – whuber May 16 '14 at 21:42
  • Yes! I updated the question to reflect "sample mean". Thanks for the question from a few hours ago. The Cauchy distribution in that example has an undefined mean, does the counterexample translate to some distribution with a defined mean? – Johan May 16 '14 at 21:48
  • Your edits beat me to it. I'll take a look at how it can be modified, thanks! I'm still interested in knowing if there are some set of stronger assumptions for which this may hold: log-concave? Something else? – Johan May 16 '14 at 21:48
  • 1
    The difficulty is that the sample itself will rarely be symmetric about *its* mean, so the underlying symmetry of $f_X$ is of little use. Writing $q$ for the derivative of $\log(f_X)$, your requirement implies (among other things) that $\sum_{i=1}^n q(x_i-\bar{x})=0$ for *all* possible samples $(x_i)$. That pretty much limits $q$ to be linear, which means $X$ has a Normal distribution. – whuber May 16 '14 at 21:59

1 Answers1

5

Consider the single-parameter Exponential Family of distributions, i.e. distributions whose probability density (or mass) function can be written as

$$f(x) = h(x)\cdot \exp{\big\{\eta(\theta)T(x)-A(\theta)\big\}}$$

The log-likelihood from an i.i.d sample of size $n$ is then

$$\tilde L = \sum_{i=1}^n\ln h(x_i) + \eta(\theta)\sum_{i=1}^nT(x_i) - nA(\theta)$$

and the derivative with respect to $\theta$ is

$$\frac {\partial \tilde L}{\partial \theta}=\eta'(\theta)\sum_{i=1}^nT(x_i)-nA'(\theta) = 0$$

$$\Rightarrow \frac 1n \sum_{i=1}^nT(x_i) = \frac {A'(\hat \theta_{MLE})}{\eta'(\hat \theta_{MLE})}$$

it is obvious from the above that, to arrive at "the sample mean is the MLE for the mean", the involved functions must have suitable forms.

Examples where the result holds
1) For the Normal distribution (with known variance $\sigma^2$) : $T(x_i) = x_i/\sigma$, $A(\theta)=\mu^2 / 2\sigma^2 \Rightarrow A'(\theta) = \mu / \sigma^2$, $\eta(\theta) = \mu/\sigma\Rightarrow \eta'(\theta) = 1/\sigma$

2) For the Bernoulli(p) distribution, $T(x_i) = x_i$, $A(\theta) -\ln (1-p)\Rightarrow A'(\theta) = 1/(1-p) $, $\eta (\theta) = \ln(p/(1-p)\Rightarrow \eta'(\theta) = 1/p(1-p)$

In these cases, indeed the MLE for the mean is the sample mean. It is perhaps easier to find counter-examples, as Whuber hinted.

Alecos Papadopoulos
  • 52,923
  • 5
  • 131
  • 241
  • 2
    You can also add the gamma distribution, parameterised by its mean and dispersion, i.e. $p(x_i)=\frac{1}{x_i\Gamma(\alpha)}\left(x_i\frac{\alpha}{\mu}\right)^{-\alpha} \exp\left(-\frac{\alpha x_i}{\mu}\right)$ – probabilityislogic May 17 '14 at 05:29
  • Why are exponential families relevant to the answer of this question? It's not the case that all exponential families belong to the nonparametric model mentioned in the question (the class of all probability distributions with densities symmetric about their mean). Or are you saying: choose an exponential family for which the distributions have densities that are symmetric about their means. (I.e. an exponential family which is a parametric submodel of the nonparametric model from the question.) For one of them the sample mean will not be the maximum likelihood estimator of the mean? – hasManyStupidQuestions May 24 '21 at 21:08