2

MLE (maximum likelihood estimation) can be defined mathematically for discrete or continuous variables. But there is a technical specificity about variables being neither discrete nor continuous. Instead they are a mixture of discrete and continuous. More precisely:

Assume you have a model with a parameter $\theta$ and an observable variable $X$ whose distribution depends on $\theta$. You have $x_1,x_2...x_n$ an independent sample of real-life observations of $X$.

If $X$ is a discrete variable, the maximum likelihood estimator is:

$$\hat\theta=\text{argmax}_\theta \left(\displaystyle\prod_{i=1}^n P_\theta(X=x_i)\right)$$

If $X$ has a density $p_\theta$, then you just use the density instead:

$$\hat\theta=\text{argmax}_\theta \left(\displaystyle\prod_{i=1}^np_\theta(x_i)\right)$$

Sometimes, $X$ is essentially continuous but has one or several atoms: the mixture of a continuous distribution and a discrete distribution. A good example : ML estimate of exponential distribution (with censored data).

I want to find a way to define MLE mathematically when you have atoms for students or people with some mathematical background but not so much in statistics. Ideally :

  • not too much theoretical or abstract
  • rather general
  • not uselessly confusing

I struggle. Any idea?

Benoit Sanchez
  • 7,377
  • 21
  • 43
  • 1
    Possible duplicate of [Maximum Likelihood Estimation (MLE) in layman terms](https://stats.stackexchange.com/questions/112451/maximum-likelihood-estimation-mle-in-layman-terms) – Tim Jul 15 '17 at 20:56
  • My question is a focus on the mathematical definition of MLE with neither discrete nor continuous variables. I think the question you are referring to is more about an intuitive general explanation of MLE. I've rephrased a bit. – Benoit Sanchez Jul 15 '17 at 21:11
  • 1
    But the definition is *the same* no matter if your data is continuous or discrete! Probability mass functions is a special kind of density function, so there is no problem with mixed data. – Tim Jul 15 '17 at 21:16
  • Not a special case (https://en.wikipedia.org/wiki/Probability_mass_function). Unless you talk of Random-Nikodym derivative to the counting measure. I especially want to avoid these theoretical things. Anyway, somebody could explain this of idea of yours as an answer. This would not make my question a duplicate. – Benoit Sanchez Jul 15 '17 at 21:23
  • 1
    But if you want to discuss mixed data type then you can't run away from those theoretical considerations! [this thread](https://stats.stackexchange.com/questions/4220/can-a-probability-distribution-value-exceeding-1-be-ok) gives nice introduction to probability densities. You basically need to introduce probability densities and discuss discrete data as a special case for it. If probability density is "probability per foot", then with discrete data you have obvious units you calculate it for. – Tim Jul 15 '17 at 21:40
  • Again you opinion and thoughts are welcome as an answer. My only point is : my question is not a duplicate. – Benoit Sanchez Jul 15 '17 at 22:12
  • Let us [continue this discussion in chat](http://chat.stackexchange.com/rooms/62295/discussion-between-benoit-sanchez-and-tim). – Benoit Sanchez Jul 15 '17 at 22:41
  • A typical case where this occurs is with censoring. For some examples see: https://stats.stackexchange.com/questions/87065/weighted-normal-errors-regression-with-censoring/276929#276929 https://stats.stackexchange.com/questions/133347/ml-estimate-of-exponential-distribution-with-censored-data/133360#133360 and many others – kjetil b halvorsen Mar 28 '18 at 11:02

0 Answers0