Questions tagged [likelihood]

Given a random variable $X$ which arise from a parameterized distribution $F(X;θ)$, the likelihood is defined as the probability of observed data as a function of $θ$: $\operatorname{L}(θ | x)=\operatorname{P}(X=x \mid θ)$

In statistics, a likelihood function is a function of the parameters of a statistical model, defined as follows: the likelihood of a set of parameter values given some observed outcomes is equal to the probability of those observed outcomes given those parameter values. Likelihood functions play a key role in statistical inference, especially methods of estimating a parameter using a statistic (a function of the data).

Reference: Wikipedia

Excerpt reference: @ars's answer on What is the difference between “likelihood” and “probability”?

1350 questions
607
votes
12 answers

What is the difference between "likelihood" and "probability"?

The wikipedia page claims that likelihood and probability are distinct concepts. In non-technical parlance, "likelihood" is usually a synonym for "probability," but in statistical usage there is a clear distinction in perspective: the number that…
Douglas S. Stones
  • 6,931
  • 4
  • 16
  • 18
91
votes
7 answers

Why to optimize max log probability instead of probability

In most machine learning tasks where you can formulate some probability $p$ which should be maximised, we would actually optimize the log probability $\log p$ instead of the probability for some parameters $\theta$. E.g. in maximum likelihood…
Albert
  • 1,145
  • 1
  • 9
  • 12
85
votes
5 answers

What is the reason that a likelihood function is not a pdf?

What is the reason that a likelihood function is not a pdf (probability density function)?
John Doe
  • 1,275
  • 2
  • 15
  • 24
66
votes
5 answers

Why do we minimize the negative likelihood if it is equivalent to maximization of the likelihood?

This question has puzzled me for a long time. I understand the use of 'log' in maximizing the likelihood so I am not asking about 'log'. My question is, since maximizing log likelihood is equivalent to minimizing "negative log likelihood" (NLL), why…
Tony
  • 1,583
  • 4
  • 15
  • 20
60
votes
5 answers

How to calculate pseudo-$R^2$ from R's logistic regression?

Christopher Manning's writeup on logistic regression in R shows a logistic regression in R as follows: ced.logr <- glm(ced.del ~ cat + follows + factor(class), family=binomial) Some output: > summary(ced.logr) Call: glm(formula = ced.del ~ cat +…
dfrankow
  • 2,816
  • 6
  • 30
  • 39
50
votes
7 answers

Why would someone use a Bayesian approach with a 'noninformative' improper prior instead of the classical approach?

If the interest is merely estimating the parameters of a model (pointwise and/or interval estimation) and the prior information is not reliable, weak, (I know this is a bit vague but I am trying to establish an scenario where the choice of a prior…
user10525
46
votes
10 answers

Why do people use p-values instead of computing probability of the model given data?

Roughly speaking a p-value gives a probability of the observed outcome of an experiment given the hypothesis (model). Having this probability (p-value) we want to judge our hypothesis (how likely it is). But wouldn't it be more natural to calculate…
Roman
  • 1,013
  • 2
  • 23
  • 38
45
votes
3 answers

What kind of information is Fisher information?

Suppose we have a random variable $X \sim f(x|\theta)$. If $\theta_0$ were the true parameter, the the likelihood function should be maximized and the derivative equal to zero. This is the basic principle behind the maximum likelihood estimator. As…
32
votes
3 answers

How to rigorously define the likelihood?

The likelihood could be defined by several ways, for instance : the function $L$ from $\Theta\times{\cal X}$ which maps $(\theta,x)$ to $L(\theta \mid x)$ i.e. $L:\Theta\times{\cal X} \rightarrow \mathbb{R} $. the random function $L(\cdot \mid…
Stéphane Laurent
  • 17,425
  • 5
  • 59
  • 101
31
votes
5 answers

Wikipedia entry on likelihood seems ambiguous

I have a simple question regarding "conditional probability" and "Likelihood". (I have already surveyed this question here but to no avail.) It starts from the Wikipedia page on likelihood. They say this: The likelihood of a set of parameter…
31
votes
4 answers

How to derive the likelihood function for binomial distribution for parameter estimation?

According to Miller and Freund's Probability and Statistics for Engineers, 8ed (pp.217-218), the likelihood function to be maximised for binomial distribution (Bernoulli trials) is given as $L(p) = \prod_{i=1}^np^{x_i}(1-p)^{1-x_i}$ How to arrive at…
30
votes
1 answer

Computation of the marginal likelihood from MCMC samples

This is a recurring question (see this post, this post and this post), but I have a different spin. Suppose I have a bunch of samples from a generic MCMC sampler. For each sample $\theta$, I know the value of the log likelihood $\log f(\textbf{x} |…
29
votes
8 answers

Bayes' Theorem Intuition

I've been trying to develop an intuition based understanding of Bayes' theorem in terms of the prior, posterior, likelihood and marginal probability. For that I use the following equation: $$P(B|A) = \frac{P(A|B)P(B)}{P(A)}$$ where $A$ represents a…
Anas Ayubi
  • 391
  • 4
  • 3
28
votes
3 answers

What are some illustrative applications of empirical likelihood?

I have heard of Owen's empirical likelihood, but until recently paid it no heed until I came across it in a paper of interest (Mengersen et al. 2012). In my efforts to understand it, I have gleaned that the likelihood of the observed data is…
27
votes
4 answers

Theoretical motivation for using log-likelihood vs likelihood

I'm trying to understand at a deeper level the ubiquity of log-likelihood (and perhaps more generally log-probability) in statistics and probability theory. Log-probabilities show up all over the place: we usually work with the log-likelihood for…
ratsalad
  • 625
  • 5
  • 14
1
2 3
89 90