Maximum likelihood estimators for a truncated distribution

Question

Consider $N$ independent samples $S$ obtained from a random variable $X$ that is assumed to follow a truncated distribution (e.g. a truncated normal distribution) of known (finite) minimum and maximum values $a$ and $b$ but of unknown parameters $\mu$ and $\sigma^2$. If $X$ followed a non-truncated distribution, the maximum likelihood estimators $\widehat\mu$ and $\widehat\sigma^2$ for $\mu$ and $\sigma^2$ from $S$ would be the sample mean $\widehat\mu = \frac{1}{N} \sum_i S_i$ and the sample variance $\widehat\sigma^2 = \frac{1}{N} \sum_i (S_i - \widehat\mu)^2$. However, for a truncated distribution, the sample variance defined in this way is bounded by $(b-a)^2$ so it is not always a consistent estimator: for $\sigma^2 > (b-a)^2$, it cannot converge in probability to $\sigma^2$ as $N$ goes to infinity. So it seems that $\widehat\mu$ and $\widehat\sigma^2$ are not the maximum-likelihood estimators of $\mu$ and $\sigma^2$ for a truncated distribution. Of course, this is to be expected since the $\mu$ and $\sigma^2$ parameters of a truncated normal distribution aren't its mean and variance.

So, what are the maximum likelihood estimators of the $\mu$ and $\sigma$ parameters of a truncated distribution of known minimum and maximum values?

Are you sure about your analysis? I think you are making an invalid assumption: for the truncated situation, the MLE of $\sigma^2$ is no longer the sample variance (and, in general, the MLE of $\mu$ is no longer the sample mean)! — whuber, Jan 30 '13 at 16:17
whuber: I know, this is precisely my question: what are the MLEs of $\sigma^2$ and $\mu$ in the truncated case? Adding a sentence to insist on this. — a3nm, Jan 30 '13 at 16:26
There isn't a closed form solution. All you can do is numerically minimize the log likelihood. But this is qualitatively no different than many other models, such as logistic regression, which also have no closed form solution. — whuber, Jan 30 '13 at 16:32
whuber: If this is true, this is pretty disappointing. Do you have references about the lack of closed form solutions? Are there closed-form estimators that are not maximum likelihood but are at least consistent (and optionally unbiased?). — a3nm, Jan 30 '13 at 16:38
@whuber: Can you at least simplify your samples into sufficient statistics so that the minimization is fast? — Neil G, Jan 30 '13 at 16:45
@Neil Yes: when you write down the likelihood equations (by setting the gradient of the log likelihood to zero) you obtain the usual stuff *plus* some terms coming from the renormalization due to truncation. The renormalization is in terms of $\mu, \sigma$, and the *given* endpoints, but is independent of the data. (Its derivative involves differences of the normal CDF.) Thus you obtain exactly the same sufficient statistics as before, but to solve the likelihood equations you have to adjust $\mu$ and $\sigma$ to equal this nasty function rather than to equal zero. — whuber, Jan 30 '13 at 16:58

score 35 · Accepted Answer · edited Jan 31 '13 at 15:25

Consider any location-scale family determined by a "standard" distribution $F$,

$$\Omega_F = \left\{F_{(\mu, \sigma)}: x \to F\left(\frac{x-\mu}{\sigma}\right) \mid \sigma \gt 0\right\}.$$

Assuming $F$ differentiable we readily find that the PDFs are $\frac{1}{\sigma}f\left((x-\mu)/\sigma\right)dx$.

Truncating these distributions to restrict their support between $a$ and $b$, $a \lt b$, means that the PDFs are replaced by

$$f_{(\mu, \sigma; a,b)}(x) = \frac{f\left(\frac{x-\mu}{\sigma}\right)dx}{\sigma C(\mu, \sigma, a, b)}, a \le x \le b$$

(and are zero for all other values of $x$) where $C(\mu, \sigma, a, b) = F_{(\mu,\sigma)}(b) - F_{(\mu,\sigma)}(a)$ is the normalizing factor needed to ensure that $f_{(\mu, \sigma; a, b)}$ integrates to unity. (Note that $C$ is identically $1$ in the absence of truncation.) The log likelihood for iid data $x_i$ therefore is

$$\Lambda(\mu, \sigma) = \sum_i \left[\log{f\left(\frac{x_i-\mu}{\sigma}\right)} - \log{\sigma}-\log{C(\mu, \sigma, a, b)}\right].$$

Critical points (including any global minima) are found where either $\sigma=0$ (a special case I will ignore here) or the gradient vanishes. Using subscripts to denote derivatives, we may formally compute the gradient and write the likelihood equations as

$$\eqalign{ 0 &= \frac{\partial\Lambda}{\partial\mu} &= \sum_i \left[\frac{-f_\mu\left(\frac{x_i-\mu}{\sigma}\right)}{f\left(\frac{x_i-\mu}{\sigma}\right)} -\frac{C_\mu(\mu,\sigma,a,b)}{C(\mu,\sigma,a,b)}\right] \\ 0 &= \frac{\partial\Lambda}{\partial\sigma} &= \sum_i \left[\frac{-f_\sigma\left(\frac{x_i-\mu}{\sigma}\right)}{\sigma^2f\left(\frac{x_i-\mu}{\sigma}\right)} -\frac{1}{\sigma}-\frac{C_\sigma(\mu,\sigma,a,b)}{C(\mu,\sigma,a,b)}\right] }$$

Because $a$ and $b$ are fixed, drop them from the notation and write $nC_\mu(\mu, \sigma, a, b)/C(\mu, \sigma,a,b)$ as $A(\mu,\sigma)$ and $nC_\sigma(\mu, \sigma, a, b)/C(\mu, \sigma,a,b)$ as $B(\mu, \sigma)$. (With no truncation, both functions would be identically zero.) Separating the terms involving the data from the rest gives

$$\eqalign{ -A(\mu,\sigma) &= \sum_i \frac{f_\mu\left(\frac{x_i-\mu}{\sigma}\right)}{f\left(\frac{x_i-\mu}{\sigma}\right)} \\ -\sigma^2 B(\mu,\sigma) - n\sigma &= \sum_i \frac{f_\sigma\left(\frac{x_i-\mu}{\sigma}\right)}{f\left(\frac{x_i-\mu}{\sigma}\right)} }$$

By comparing these to the no-truncation situation it is evident that

Any sufficient statistics for the original problem are sufficient for the truncated problem (because the right hand sides have not changed).
Our ability to find closed-form solutions relies on the tractability of $A$ and $B$. If these do not involve $\mu$ and $\sigma$ in simple ways, we cannot hope to obtain closed-form solutions in general.

For the case of a normal family, $C(\mu,\sigma,a,b)$ of course is given by the cumulative normal PDF, which is a difference of error functions: there is no chance that a closed-form solution can be obtained in general. However, there are only two sufficient statistics (the sample mean and variance will do) and the CDF is as smooth as can be, so numerical solutions will be relatively easy to obtain.

Thanks a lot for this very detailed answer! I'm not sure I get what $f_\mu$, $f_\sigma$ , $C_\mu$, and $C_\sigma$ are, could you define them? Also, it's obvious but to be precise maybe you could say that your expression for the pdf is for $x \in [a, b]$ (and the pdf is zero outside of that). Thanks again! — a3nm, Jan 30 '13 at 17:37
The usual longer notation is $C_\mu = \frac{\partial}{\partial\mu}C(\mu,\sigma,a,b)$, etc: as announced, it is a derivative. I will make the second change you suggest because it's an important clarification, thanks. — whuber, Jan 30 '13 at 17:38
Also, since your answer is more general than the one I expected, I edited my question to insist less on the case of normal distributions. Thanks again for your effort. — a3nm, Jan 30 '13 at 17:43
It was easier to explain at this level of generality compared to focusing on the Normal distributions! Computing the derivatives and showing the precise form of the CDF are unnecessary distractions (although useful when you start actually coding the numerical solution). — whuber, Jan 30 '13 at 17:45
Err, I don't understand why, when differentiating Λ(μ,σ) according to μ (or idem for σ), the derivative of the −log C(μ,σ,a,b) part yields a $+C_\mu/C$ rather than $-C_\mu/C$. Am I missing something? — a3nm, Jan 31 '13 at 14:17
@a3nm You're right--I'll change those signs. Thanks for catching that. — whuber, Jan 31 '13 at 14:27
I think there's another error: $n$ factors should appear in the two last equations on terms that were taken out of the sum. The $C(\mu, \sigma, a, b)$ denominators also vanished somehow -- I guess you meant to include them in the definition of $A$ and $B$. — a3nm, Jan 31 '13 at 15:01
Thanks for fixing! You missed one of them; could you review my edit? — a3nm, Jan 31 '13 at 15:10
> we cannot hope to obtain closed-form solutions in general. Is the existence of a solution always granted ? If not is it possible to use approximated solution ? — lcrmorin, May 26 '16 at 09:51
@Were_cat Interesting question. Yes, a solution is guaranteed, because as $\mu$ or $\sigma$ become arbitrarily large the likelihood must grow arbitrarily small by virtue of the fact that the tails of $F$ asymptotically reach zero probability and that $F$ has a density. Thus there must exist at least one solution. If there did not exist a solution, then trying to approximate it would be fruitless, wouldn't it? — whuber, May 26 '16 at 13:45
An R code working out the details (and hopefully working - did not test) can be found in Hattaway, J., 2010, Parameter Estimation and Hypothesis Testing for the Truncated Normal Distribution with Applications to Introductory Statistics Grades (https://scholarsarchive.byu.edu/cgi/viewcontent.cgi?article=3052&context=etd) — Martin Modrák, Feb 14 '18 at 13:03

Maximum likelihood estimators for a truncated distribution

1 Answers1

Linked

Related