Questions tagged [maximum-likelihood]

a method of estimating parameters of a statistical model by choosing the parameter value that optimizes the probability of observing the given sample.

Given certain regularity conditions (e.g. the support of the density function does not depend on the unknown parameter), maximum-likelihood estimators are consistent, efficient (in that they achieve the Cramer-Rao lower bound) and are asymptotically normal with covariance matrix given by the inverse of the Fisher Information matrix.

Given that ML is a parametric method based on a specified distribution family, it relies on the correctness of the assumed distribution model of the data. In many cases it is not possible to find a closed form solution, thereby requiring numerical methods (e.g. Newton-Raphson search).

2972 questions
117
votes
14 answers

Maximum Likelihood Estimation (MLE) in layman terms

Could anyone explain to me in detail about maximum likelihood estimation (MLE) in layman's terms? I would like to know the underlying concept before going into mathematical derivation or equation.
87
votes
3 answers

What is "restricted maximum likelihood" and when should it be used?

I have read in the abstract of this paper that: "The maximum likelihood (ML) procedure of Hartley aud Rao is modified by adapting a transformation from Patterson and Thompson which partitions the likelihood render normality into two parts, one…
Joe King
  • 3,024
  • 6
  • 32
  • 58
78
votes
2 answers

Basic question about Fisher Information matrix and relationship to Hessian and standard errors

Ok, this is a quite basic question, but I am a little bit confused. In my thesis I write: The standard errors can be found by calculating the inverse of the square root of the diagonal elements of the (observed) Fisher Information…
Jen Bohold
  • 1,410
  • 2
  • 13
  • 19
66
votes
3 answers

Maximum likelihood method vs. least squares method

What is the main difference between maximum likelihood estimation (MLE) vs. least squares estimaton (LSE) ? Why can't we use MLE for predicting $y$ values in linear regression and vice versa? Any help on this topic will be greatly appreciated.
evros
  • 751
  • 2
  • 7
  • 6
66
votes
5 answers

Why do we minimize the negative likelihood if it is equivalent to maximization of the likelihood?

This question has puzzled me for a long time. I understand the use of 'log' in maximizing the likelihood so I am not asking about 'log'. My question is, since maximizing log likelihood is equivalent to minimizing "negative log likelihood" (NLL), why…
Tony
  • 1,583
  • 4
  • 15
  • 20
65
votes
2 answers

What is the difference between a partial likelihood, profile likelihood and marginal likelihood?

I see these terms being used and I keep getting them mixed up. Is there a simple explanation of the differences between them?
Rob Hyndman
  • 51,928
  • 23
  • 126
  • 178
63
votes
3 answers

What is the difference in Bayesian estimate and maximum likelihood estimate?

Please explain to me the difference in Bayesian estimate and Maximum likelihood estimate?
triomphe
  • 787
  • 1
  • 6
  • 9
63
votes
9 answers

Advanced statistics books recommendation

There are several threads on this site for book recommendations on introductory statistics and machine learning but I am looking for a text on advanced statistics including, in order of priority: maximum likelihood, generalized linear models,…
62
votes
2 answers

What does the inverse of covariance matrix say about data? (Intuitively)

I'm curious about the nature of $\Sigma^{-1}$. Can anybody tell something intuitive about "What does $\Sigma^{-1}$ say about data?" Edit: Thanks for replies After taking some great courses, I'd like to add some points: It is measure of information,…
Arya
  • 873
  • 1
  • 7
  • 8
58
votes
8 answers

Examples where method of moments can beat maximum likelihood in small samples?

Maximum likelihood estimators (MLE) are asymptotically efficient; we see the practical upshot in that they often do better than method of moments (MoM) estimates (when they differ), even at small sample sizes Here 'better than' means in the sense…
Glen_b
  • 257,508
  • 32
  • 553
  • 939
56
votes
2 answers

Cross-Entropy or Log Likelihood in Output layer

I read this page: http://neuralnetworksanddeeplearning.com/chap3.html and it said that sigmoid output layer with cross-entropy is quite similiar with softmax output layer with log-likelihood. what happen if I use sigmoid with log-likelihood or…
malioboro
  • 851
  • 1
  • 11
  • 19
53
votes
9 answers

Are all models useless? Is any exact model possible -- or useful?

This question has been festering in my mind for over a month. The February 2015 issue of Amstat News contains an article by Berkeley Professor Mark van der Laan that scolds people for using inexact models. He states that by using models, statistics…
53
votes
2 answers

Intuition behind why Stein's paradox only applies in dimensions $\ge 3$

Stein's Example shows that the maximum likelihood estimate of $n$ normally distributed variables with means $\mu_1,\ldots,\mu_n$ and variances $1$ is inadmissible (under a square loss function) iff $n\ge 3$. For a neat proof, see the first chapter…
45
votes
3 answers

What kind of information is Fisher information?

Suppose we have a random variable $X \sim f(x|\theta)$. If $\theta_0$ were the true parameter, the the likelihood function should be maximized and the derivative equal to zero. This is the basic principle behind the maximum likelihood estimator. As…
38
votes
3 answers

Maximum Likelihood Estimators - Multivariate Gaussian

Context The Multivariate Gaussian appears frequently in Machine Learning and the following results are used in many ML books and courses without the derivations. Given data in form of a matrix $\mathbf{X} $ of dimensions $ m \times p$, if we…
1
2 3
99 100