Do I need to guess a distribution to use MLE?

Question

From what I understand about Maximum Likelihood Estimation, by observing a set of data, we guess a distribution family and then find the parameters for that distribution that will maximize the probability of observing the data we have observed. (Am I right?)

So prior to using MLE I do need to have a way to guess a distribution family that the data follow. How can I make this guess?

Related: ["Idea and intuition behind quasi maximum likelihood estimation (QMLE)"](http://stats.stackexchange.com/questions/185154/idea-and-intuition-behind-quasi-maximum-likelihood-estimation-qmle). — Richard Hardy, Aug 12 '16 at 06:08
This is same topic as this: http://stats.stackexchange.com/questions/110687/is-maximum-likelihood-estimation-mle-a-parametric-approach/110694#110694 and they shoudl maybe be united? How to do that? — kjetil b halvorsen, Feb 20 '17 at 16:12

score 6 · Answer 1 · edited Aug 11 '16 at 23:15

To apply parametric MLE, you need to specify a parametric distribution. For non-parametric MLE, you do not specify a parametric distribution.

The most popular of the non-parametric MLE approaches is called Empirical Likelihood https://en.wikipedia.org/wiki/Empirical_likelihood (not much of a write up on that page). The classic book in the field is "Empirical Likelihood" by Art B, Owen https://www.amazon.com/Empirical-Likelihood-Art-B-Owen/dp/1584880716 . The freely accessible paper "Empirical Likelihood", Art B, Owen, Annals of Statistics 1990, Vol. 18, pp. 90-120 https://projecteuclid.org/download/pdf_1/euclid.aos/1176347494 will give you a pretty good idea of the field. Freely available slides by Owen are at http://statweb.stanford.edu/~owen/pubtalks/DASprott.pdf .

Basically, Empirical Likelihood makes use of the empirical distribution of the data, as the basis for forming an empirical likelihood. This empirical likelihood can be maximized, subject to various constraints, sometimes in closed form, but often requiring numerical constrained nonlinear optimization methods. It can be used as the basis for computing non-parametric likelihood ratio tests and confidence regions (not necessarily ellipsoidal or symmetric).

There are relationships between empirical likelihood and bootstrapping, and indeed, the two can be combined.

If you don't have a solid rationale for use of a particular parametric distribution, you're generally better off using a non-parametric method, such as empirical likelihood. The downside may be that the computations are more computationally intensive, and the confidence regions which result do not look like those most people have come to expect based on, for instance, Normal distribution assumptions.

Is empirical likelihood related to empirical Bayes method? Also, to answer the actual question *How can I make this guess?* you could start by saying *You don't have to.* or similar. Also, *computations are more computationally intensive* :) — Richard Hardy, Aug 12 '16 at 06:09
"Is empirical likelihood related to empirical Bayes method?" Well, they both have "empirical" in their name :). There are some ways of combining them as can be seen by Googling "empirical likelihood" "empirical Bayes". — Mark L. Stone, Aug 12 '16 at 11:41

score 2 · Answer 2 · edited Aug 11 '16 at 23:16

To apply MLE you need to assume a distribution. So, yes, you need to have a distribution in mind, usually. The standard intro texts use Gaussian. For instance, they'd show you how Gaussian distribution leads to MLE in linear model to the same estimators as in least squares regression.

Gaussian distribution with independence (random sample) assumption is a popular choice. However, other distributions are used when they're more suitable for a problem. Often, you don't have to "guess" the distribution, but already know what family it belongs to. Maybe you know it must be Poisson, for instance. In this case you plug it into MLE equations and derive the appropriate likelihood function to estimate the parameter of the distribution

score 1 · Answer 3 · answered Aug 11 '16 at 23:09

In general, no you cannot use MLE to find which family of distributions might provide a good parametric model for an outcome. That's not to say that there aren't some exploratory techniques that could shed some light on possibilities. But, as we know from statistics, using the same data as a hypothesis generating and hypothesis confirming tool will lead to increased false positive errors.

Ideally a family of distributions is chosen before the data are collected. You can often think about the data generating mechanism and/or draw parallels between what other researchers have used and discussed. For instance, Poisson variables come from independent exponential interarrival times, and 3 parameter Weibull models can flexibly describe time-to-event curves. You can also rely on the fact that predictions and inference coming from similar probability models tends to be quite similar, for instance, inference from the t-test tends to be quite similar to the z-test even in moderately small samples.

Another thing to consider is that Tukey was quoted as having said, "Build your model as big as a house!" within the limits of the data themselves, making oversimplified assumptions tends to be unnecessary when more flexible nested parametric models are available. For instance, instead of exponential time-to-event models, you could consider Weibull as a bigger class, or 3 parameter Weibull as an even bigger class of models. For counting processes, negative binomial models are basically two parameter Poisson models. You can even consider mixtures or empirical likelihood as ways of describing densities with a minimal number of assumptions.

score 1 · Answer 4 · edited Apr 13 '17 at 12:44

How can I make this guess?

As pointed out in other answers, sometimes you know what the distribution must be due to the nature of the data generating process. Consider Generalized Extreme Value Distribution as described in Wikipedia:

By the extreme value theorem the GEV distribution is the only possible limit distribution of properly normalized maxima of a sequence of independent and identically distributed random variables.

Of course, this is an asymptotic result, but you may count on it for sufficiently large samples.

Other times you may just have a rough idea and would not know exactly. However, this may suffice in the framework of quasi maximum likelihood estimation (QMLE). QMLE allows consistently estimating model parameters and doing inference when the assumed distribution does not match the true distribution. Even though it does not work universally (not all distributions can be assumed in place of other distributions), it can still be pretty useful.

(For an intuitive explanation of why and how QMLE works see Idea and intuition behind quasi maximum likelihood estimation (QMLE).)

Do I need to guess a distribution to use MLE?

4 Answers4

Linked