EM maximum likelihood estimation for Weibull distribution

Question

Note: I am posting a question from a former student of mine unable to post on his own for technical reasons.

Given an iid sample $x_1,\ldots,x_n$ from a Weibull distribution with pdf $$ f_k(x) = k x^{k-1} e^{-x^k} \quad x>0 $$ is there a useful missing variable representation $$f_k(x) = \int_\mathcal{Z} g_k(x,z)\,\text{d}z$$and hence an associated EM (expectation-maximisation) algorithm that could be used to find the MLE of $k$, instead of using straightforward numerical optimisation?

@probabilityislogic: nothing's wrong with anything! My student would like to know if there is an EM version, that's all... — Xi'an, Feb 14 '12 at 10:43
Could you give an example of what you are looking for in a different, simpler context, e.g. perhaps with observations of a Gaussian or uniform random variable? When all of the data is observed I (and some of the other posters, based on their comments) don't see how EM is relevant to your question. — ahfoss, Jan 15 '14 at 16:35
@probabilityislogic I think you should have said, "Oh, you mean you want to USE Newton Raphson?". Weibulls are regular families... I think, so ML solutions are unique. Therefore, EM has nothing to "E" over, so you're just "M"ing... and finding roots of score equations is the best way to do it! — AdamO, Aug 20 '14 at 23:30
@Xi'an what do you mean missing variable representation? What you have is simply a special case of the Weibull distribution. — AdamO, Jul 20 '15 at 21:21
@Xi'an I see. A missing variable representation is like the representation of non-central $\lambda$ chi square distributions with a central chi square distribution having a missing Poisson $\lambda$ distributed degrees of freedom. This is a very interesting question. — AdamO, Jul 21 '15 at 19:16

emcor · Answer 1 · 2014-11-09T16:00:35.450

The Weibull MLE is only numerically solvable:

Let $$ f_{\lambda,\beta}(x) = \begin{cases} \frac{\beta}{\lambda}\left(\frac{x}{\lambda}\right)^{\beta-1}e^{-\left(\frac{x}{\lambda}\right)^{\beta}} & ,\,x\geq0 \\ 0 &,\, x<0 \end{cases} $$ with $\beta,\,\lambda>0$.

1) Likelihoodfunction: $$ \mathcal{L}_{\hat{x}}(\lambda, \beta) =\prod_{i=1}^N f_{\lambda,\beta}(x_i) =\prod_{i=1}^N \frac{\beta}{\lambda}\left(\frac{x_i}{\lambda}\right)^{\beta-1}e^{-\left(\frac{x_i}{\lambda}\right)^{\beta}} = \frac{\beta^N}{\lambda^{N \beta}} e^{-\sum_{i=1}^N\left(\frac{x_i}{\lambda}\right)^{\beta}} \prod_{i=1}^N x_i^{\beta-1} $$

log-Likelihoodfunction: $$ \ell_{\hat{x}}(\lambda, \beta):= \ln \mathcal{L}_{\hat{x}}(\lambda, \beta)=N\ln \beta-N\beta\ln \lambda-\sum_{i=1}^N \left(\frac{x_i}{\lambda}\right)^\beta+(\beta-1)\sum_{i=1}^N \ln x_i $$

2) MLE-Problem: \begin{equation*} \begin{aligned} & & \underset{(\lambda,\beta) \in \mathbb{R}^2}{\text{max}}\,\,\,\,\,\, & \ell_{\hat{x}}(\lambda, \beta) \\ & & \text{s.t.} \,\,\, \lambda>0\\ & & \beta > 0 \end{aligned} \end{equation*} 3) Maximization by $0$-gradients: \begin{align*} \frac{\partial l}{\partial \lambda}&=-N\beta\frac{1}{\lambda}+\beta\sum_{i=1}^N x_i^\beta\frac{1}{\lambda^{\beta+1}}&\stackrel{!}{=} 0\\ \frac{\partial l}{\partial \beta}&=\frac{N}{\beta}-N\ln\lambda-\sum_{i=1}^N \ln\left(\frac{x_i}{\lambda}\right)e^{\beta \ln\left(\frac{x_i}{\lambda}\right)}+\sum_{i=1}^N \ln x_i&\stackrel{!}{=}0 \end{align*} It follows: \begin{align*} -N\beta\frac{1}{\lambda}+\beta\sum_{i=1}^N x_i^\beta\frac{1}{\lambda^{\beta+1}} &= 0\\\\ -\beta\frac{1}{\lambda}N +\beta\frac{1}{\lambda}\sum_{i=1}^N x_i^\beta\frac{1}{\lambda^{\beta}} &= 0\\\\ -1+\frac{1}{N}\sum_{i=1}^N x_i^\beta\frac{1}{\lambda^{\beta}}&=0\\\\ \frac{1}{N}\sum_{i=1}^N x_i^\beta&=\lambda^\beta \end{align*} $$\Rightarrow\lambda^*=\left(\frac{1}{N}\sum_{i=1}^N x_i^{\beta^*}\right)^\frac{1}{\beta^*}$$

Plugging $\lambda^*$ into the second 0-gradient condition:

\begin{align*} \Rightarrow \beta^*=\left[\frac{\sum_{i=1}^N x_i^{\beta^*}\ln x_i}{\sum_{i=1}^N x_i^{\beta^*}}-\overline{\ln x}\right]^{-1} \end{align*}

This equation is only numerically solvable, e.g. Newton-Raphson algorithm. $\hat{\beta}^*$ can then be placed into $\lambda^*$ to complete the ML estimator for the Weibull distribution.

Unfortunately, this does not appear to answer the question in any discernible way. The OP is very clearly aware of Newton-Raphson and related approaches. The feasibility of N-R in no way precludes the existence of a missing-variable representation or associated EM algorithm. In my estimation, the question is not concerned at all with numerical solutions, but rather is probing for *insight* that might become apparent if an interesting missing-variable approach were demonstrated. — cardinal, Sep 21 '14 at 14:47
@cardinal It is one thing to *say* there was only numerical solution, and it is another thing to *show* there is only numerical solution. — emcor, Sep 22 '14 at 12:17
Dear @emcor, I think you may be misunderstanding what the question is asking. Perhaps reviewing the other answer and associated comment stream would be helpful. Cheers. — cardinal, Sep 22 '14 at 14:31
@cardinal I agree it is not direct answer, but it is the exact expressions for the MLE's e.g. can be used to verify the EM. — emcor, Sep 22 '14 at 18:29

score 7 · Accepted Answer · edited May 01 '18 at 00:57

I think the answer is yes, if I have understood the question correctly.

Write $z_i = x_i^k$. Then an EM algorithm type of iteration, starting with for example $\hat k = 1$, is

E step: ${\hat z}_i = x_i^{\hat k}$
M step: $\hat k = \frac{n}{\left[\sum({\hat z}_i - 1)\log x_i\right]}$

This is a special case (the case with no censoring and no covariates) of the iteration suggested for Weibull proportional hazards models by Aitkin and Clayton (1980). It can also be found in Section 6.11 of Aitkin et al (1989).

Aitkin, M. and Clayton, D., 1980. The fitting of exponential, Weibull and extreme value distributions to complex censored survival data using GLIM. Applied Statistics, pp.156-163.
Aitkin, M., Anderson, D., Francis, B. and Hinde, J., 1989. Statistical Modelling in GLIM. Oxford University Press. New York.

Thanks a lot David! Treating $x_i^k$ as the missing variate never crossed my mind...! — Xi'an, Apr 30 '18 at 11:53

score 4 · Answer 3 · answered May 18 '16 at 16:17

4

Though this is an old question, it looks like there is an answer in a paper published here: http://home.iitk.ac.in/~kundu/interval-censoring-REVISED-2.pdf

In this work the analysis of interval-censored data, with Weibull distribution as the underlying lifetime distribution has been considered. It is assumed that censoring mechanism is independent and non-informative. As expected, the maximum likelihood estimators cannot be obtained in closed form. In our simulation experiments it is observed that the Newton-Raphson method may not converge many times. An expectation maximization algorithm has been suggested to compute the maximum likelihood estimators, and it converges almost all the times.

answered May 18 '16 at 16:17

user3204720

41
2

1

Can you post a full citation for the paper at the link, in case it goes dead? – gung - Reinstate Monica May 18 '16 at 16:28
1

This is *an* EM algorithm, but does not do what I believe the OP wants. Rather, the E-step imputes the censored data, after which the M-step uses a fixed point algorithm with the complete data set. So the M-step is not in closed form (which I think is what the OP is looking for). – Cliff AB May 18 '16 at 16:34
1

@CliffAB: thank you for the link (+1) but indeed the EM is naturally induced in this paper by the censoring part. My former student was looking for a plain uncensored iid Weibull likelihood optimisation via EM. – Xi'an Oct 30 '16 at 17:49

score -1 · Answer 4 · answered Jan 13 '14 at 16:42

-1

In this case the MLE and EM estimators are equivalent, since the MLE estimator is actually just a special case of the EM estimator. (I am assuming a frequentist framework in my answer; this isn't true for EM in a Bayesian context in which we're talking about MAP's). Since there is no missing data (just an unknown parameter), the E step simply returns the log likelihood, regardless of your choice of $k^{(t)}$. The M step then maximizes the log likelihood, yielding the MLE.

EM would be applicable, for example, if you had observed data from a mixture of two Weibull distributions with parameters $k_1$ and $k_2$, but you didn't know which of these two distributions each observation came from.

answered Jan 13 '14 at 16:42

ahfoss

1,289
1
8
22

6

I think you may have misinterpreted the point of the question, which is: Does there exist some missing-variable interpretation from which one would obtain the given Weibull likelihood (and which would allow an EM-like algorithm to be applied)? – cardinal Jan 13 '14 at 17:32
@cardinal Thanks for the feedback. I don't see what the `missing-variable interpretation` would be in this case (without changing the statement of the problem), since we are given the observations $x_1, ..., x_n$. What am I missing here? Specifically, what would the unobserved data be during the E-step? – ahfoss Jan 13 '14 at 21:19
@cardinal I believe my posted answer does indeed satisfy your criteria: my missing-variable interpretation is that nothing is missing, and from that I obtain the given Weibull likelihood. This allows an EM-like algorithm to be applied, as shown in my post. – ahfoss Jan 14 '14 at 18:14
@ahfoss: as pointed out by cardinal, my question was about exhibiting some missing data structure associated with the Weibull likelihood towards a regular EM implementation. Rather than brute-force optimisation. – Xi'an Jan 15 '14 at 12:36
Could you elaborate on why you characterize the method of MLE as `brute-force`? Compared to the EM algorithm, in which you maximize the Q function (the approximated likelihood obtained in the E step) iteratively, MLE seems much simpler by comparison since you simply maximize the true likelihood function once. – ahfoss Jan 15 '14 at 16:38
1

Your answer is tautological: use maximum likelihood to find the maximum likelihood estimator. My friend was looking for a latent variable representation of the above distribution to use instead an EM scheme. – Xi'an Jan 18 '14 at 22:52
It's not a tautology -- it's simple taxonomy. Unless I have misrepresented your question, the EM algorithm is equivalent to the MLE algorithm in this context since there is no missing data. Poster ocram and I have both suggested slightly modified problems which _would_ require a non-trivial application of the EM algorithm, but you say these are not what you're asking about. Please provide a specific example/definition of what you're looking for when you say `latent variable representation`. I think this specific point of confusion is why the question has not been answered yet. – ahfoss Jan 20 '14 at 03:38
4

The question statement in @Xi'an's post is quite clear. I think the reason it hasn't been answered is because any answer is likely nontrivial. (It's interesting, so I wish I had more time to think about it.) At any rate, your comment appears to betray a misunderstanding of the EM algorithm. Perhaps the following will serve as an antidote: – cardinal Jan 22 '14 at 02:37
6

Let $f(x) = \pi \varphi(x-\mu_1) + (1-\pi) \varphi(x-\mu_2)$ where $\varphi$ is the standard normal density function. Let $F(x) = \int_{-\infty}^x f(u)\,\mathrm{d}u$. With $U_1,\ldots,U_n$ iid standard uniform, take $X_i = F^{-1}(U_i)$. Then, $X_1,\ldots,X_n$ is a sample from a Gaussian mixture model. We can estimate the parameters by (brute-force) maximum likelihood. Is there any missing data in our data-generation process? **No**. Does it have a latent-variable representation allowing for the use of an EM algorithm? **Yes, absolutely**. – cardinal Jan 22 '14 at 02:37
2

Now, the question at hand is whether we can say the same thing about estimating the parameters from a Weibull sample. And, if so, what is the latent-variable representation that would allow for such an algorithm? – cardinal Jan 22 '14 at 02:38
Thank you @cardinal for a clear example. This is the problem I had previously suggested in the second paragraph of my answer (albeit without the clarity and detail of your presentation). In my proposed problem, $k_1$ and $k_2$ of the Weibull mixture model would be analogous to the parameters $\mu_1$ and $\mu_2$ of a Gaussian mixture. Just to make sure we're on the same page, I would clarify and say that the parameters of a Gaussian random variable _can_ be estimated by maximum likelihood, whereas those of a GMM cannot (assuming latent population membership), hence the need for an EM algorithm. – ahfoss Jan 22 '14 at 19:48
4

My apologies @cardinal; I think I have misunderstood two things about your latest post. Yes, in the GMM problem you could search $\mathbb{R}^2 \times [0,1]$ via a brute force ML approach. Also, I now see that the original problem looks for a solution that involves introducing a latent variable that allows for an EM approach to estimating the parameter $k$ in the given density $k x^{k-1}e^{-x^k}$. An interesting problem. Are there any examples of using EM like this in such a simple context? Most of my exposure to EM has been in the context of mixture problems and data imputation. – ahfoss Jan 23 '14 at 02:22
3

@ahfoss: (+1) to your latest comment. *Yes!* You got it. As for examples: (i) it shows up in censored data problems, (ii) classical applications like hidden Markov models, (iii) simple threshold models like probit models (e.g., imagine observing the latent $Z_i$ instead of Bernoulli $X_i = \mathbf{1}_{(Z_i > \mu)}$), (iv) estimating variance components in one-way random effects models (and much more complex mixed models), and (v) finding the posterior mode in a Bayesian hierarchical model. The simplest is probably (i) followed by (iii). – cardinal Jan 23 '14 at 03:11

EM maximum likelihood estimation for Weibull distribution

4 Answers4

Linked

Related