1

What happens to the log likelihood (or indeed the likelihood) function, when the MLE does not exist?

The log likelihood is defined (for independent observations) as

$$l(\boldsymbol{\theta}) = \Sigma_{i=1}^N\text{ln}(P(y_i|\mathbf{x}_i,\boldsymbol{\theta}))$$

where the sum is over the observations, the $y_i$ are the endogenous variable values and the $\mathbf{x_i}$ are the values of the covariates in the ith observation.

The maximum likelihood estimator is

$$\boldsymbol{\theta}^* = argmax_\boldsymbol{\theta} \ l(\boldsymbol{\theta})$$

with a corresponding log likelihood

$$l(\boldsymbol{\theta}^*) = max_\boldsymbol{\theta} \ \Sigma_{i=1}^N\text{ln}(P(y_i|\mathbf{x}_i,\boldsymbol{\theta}))$$

I understand that this may depend on the model, and perhaps even on the covariate distribution. I am particularly interested in the logistic regression with i.i.d. Gaussian covariates, however more general answers or answers for other models/distributions would be most welcome.

Meep
  • 153
  • 6
  • 3
    "Does not exist" in what sense? – Tim Aug 30 '19 at 18:36
  • 1
    @Tim The data is completely or quasi separated. I believe then the maximum likelihood estimator is on the boundary of the domain (i.e. somewhere at infinity) see thiis paper by Albert and Anderson ,1984 https://www.jstor.org/stable/2336390?seq=1#metadata_info_tab_contents – Meep Aug 30 '19 at 18:49
  • 2
    Have you reviewed our questions addressing perfect separation? What is it in particular that you would like to know? https://stats.stackexchange.com/search?tab=votes&q=perfect%20separation Is a sufficient answer to your question the observation that the likelihood for some parameter increases as you move towards $\pm \infty$? Why or why not? – Sycorax Aug 30 '19 at 19:03
  • The asymptotics do not depend on the log likelihood or existence of MLE for any given set of data (essentially by definition): they depend on the *model.* Are you positing a *model* of complete separation? If so, the MLE is easy to work out from first principles and from that you can readily derive its asymptotic behavior. Obviously the log likelihood will almost surely not have a maximum for any dataset of any size in that case. – whuber Aug 30 '19 at 20:25

1 Answers1

4

In the following linked paper you have a detailed answer: in short, MLE will try to set the estimated coefficients to +/inf, as explained at pag.340-341 of this paper that is a very good suggested reading. If you need practical remedies, and examples/solutions to be implemented in R consider also this answer in addition to the previous text.

And this completes the picture I think.

Fr1
  • 1,348
  • 3
  • 10
  • Your answer seems to concern what happens to the MLE, and not the log likelihood itself! Suppose some coefficients of the parameter vector tend to infinity, and some to minus infinity. Can anything be said about the asymptotics of the log likelihood? I am not particularly concerned with what happens to a programmatical implementation of some MLE finding algorithm. It will, of course, fail to converge to a solution in the parameter space; but what happens to the log likelihood itself? – Meep Aug 30 '19 at 19:19
  • 3
    Have you read pag. 340-341? What you are asking is explained.. anyway, depending on what exact situation you assume (again see pag 340-341), you can plug the estimate into the likelihood to see what happens to the likelihood.. it is a simple optimization problem where you want to know the value of the function at the asymptotic optimum +-inf. so just plug!! In this case take the limits – Fr1 Aug 30 '19 at 19:23
  • @Fri I in fact had been reading these pages, and the only reference to the value of the log likelihood is" Thus, the likelihood function under complete separation is monotonic, which implies that a finite maximum likelihood estimate does not exist". From this, I would think it would increase to 0 when there are a discrete set of andogenous variables (i.e. the values y can take); but when you have a continouus distribution the probability density can become arbitraily large, even approach delta spikes, and the log likelihood – Meep Aug 30 '19 at 19:41
  • diverges to positive infinty. Either I am being really obtuse and what I have said is rubbish, or the answer really is not as obvious as it is being made out to seem. – Meep Aug 30 '19 at 19:42
  • no no you are not obtuse, you made a good point and a good question, that is why i posted a good paper as a reference. You know, the best thing you may have is a paper (possibly good), because you cannot cite as a direct reference for an official document cross validated (which is however a super site)... (to be cont) – Fr1 Aug 30 '19 at 19:58
  • The best thing is to use Cross validated to get 1) a good understanding 2) good references to cite in your official papers/documents/productions,.... That is why, when I can, I prefer to post papers rather than answers. that's the reason. – Fr1 Aug 30 '19 at 19:58
  • Ah OK. But with regards to the understanding, could you verify if you think my interpretation is correct? I cannot find anything stated explicitly regarding this. – Meep Aug 30 '19 at 20:02
  • It will depend on the kind of separation you have, if it is partial or perfect and what the variable is predicting, if 0 or 1. Now I am out for dinner (Eu time), so I cannot write too much, but if you try to simulate two samples, one where you have a single predictor that always predict y=1 and another where you have a predictor that predicts y=0, you will see very quickly what happens when you take the limit for beta going to inf, the lik function will try to reach the prob extrema to 1/0 – Fr1 Aug 30 '19 at 20:36
  • 2
    So now I am back.. consider the simplest case as I anticipated: so in light of the previous comment, taking beta =+inf for positive perf sep you have a likelihood that approaches 1, ie a loglik that approaches 0. Since for negative sep you would have a lik that approaches 0 for beta=inf, then just revert the sign and take beta=-inf, and you are back to the case where the lik tends to 1 and the loglik tends to 0 (I.e the likelihood is maximized). Which is why the asymptotic optimal value for the parameter of the relevant variable for the separation would be +-inf for pos/neg perf separation – – Fr1 Aug 30 '19 at 21:52
  • 2
    For more details see the edit with the new link provided – Fr1 Aug 30 '19 at 21:55