Under which condition is the likelihood also a probability distribution ?
-
4@martn this is not the case; note that the likelihood function is a function of *parameters* (rather than of the data; it treats the data as given) and need not integrate to 1. – Glen_b Dec 11 '17 at 07:14
-
Likelihood is not probability distrubtuion!! – SmallChess Dec 11 '17 at 07:21
-
@Glen_b Under which conditions does it integrate to 1 ? – redenzione11 Dec 11 '17 at 07:45
-
2You can use Bayes theorem with a flat prior to work out the normalizing constant; when that's 1, it is a density. [If you do it the usual way, the denominator just comes down to the integral of the density, which is not very enlightening. You could also see it as the marginal density of the data-variables at the sample values] -- if this is unsatisfying, you may regard this as a good reason why I didn't answer the question. – Glen_b Dec 11 '17 at 07:51
-
The question may get closed if you do not expand on this one sentence. Why do you come up with it? Is there any specific setting in which this is relevant? Which examples have you found? &tc., &tc... – Xi'an Dec 11 '17 at 10:51
2 Answers
The likelihood is a two variables function $L(\theta,x)$.
For fixed $\theta$, this function can be seen as a function of $x$, and this is a distribution: the distribution of $x$ for this fixed $\theta$.
For fixed $x$, this function can be seen as a function of $\theta$, and this should not be thought as a distribution. Most often it is not formally a distribution since it does no sum to 1. Sometimes it may formally be a distribution and sum to 1 (see Xi'an answer), but it can be thought as an "accident".
Actually, when using the word "likelihood" we most often implicitly mean that we look at it as a function of $\theta$ for fixed $x$, thus it makes sense saying the likelihood is not a distribution.
Bayesian inference is a useful way however to understand that in spite of not being a distribution it is related to a distribution. In Bayesian inference the distribution of $\theta$ for fixed $x$, known as the posterior, is :
$$p(\theta|x)=p(\theta)L(\theta,x)c(x)$$
where:
- $p(\theta)$ is the prior
- $L(\theta,x)$ is the likelihood
- $c(x)$ is a normalization constant (that is not very important)
The likelihood can be thought as the distribution of $\theta$ for fixed $x$ (actually proportional to it) for the special case where the prior is uniform (constant). Generally, the likelihood is not the posterior, but the function by which you multiply the prior to get the the posterior. It plays a key role in defining a distribution (the posterior) while not being a distribution.

- 7,377
- 21
- 43
On the generic and general distinction between likelihood and probability density, check this question on CV as it has fairly detailed and useful answers. Plus this other question on mathoverflow.
For the likelihood $\ell(\theta|x)$ to be a probability distribution, or more precisely the density of a probability distribution, it need satisfy$$\int_\Theta \ell(\theta|x)\text{d}\theta=1\quad\forall x\in\mathcal{X}\qquad\qquad(1)$$on top of satisfying$$\int_\Theta \ell(\theta|x)\text{d}x=1\quad\forall\theta\in\Theta\qquad\qquad(2)$$This is a situation that occurs when $x$ and $\theta$ are interchangeable as in location families$$\ell(\theta|x)=f(x-\theta)\qquad x,\theta\in\mathcal{X}=\Theta$$ but not in scale families$$\ell(\theta|x)=g(x/\theta)/\theta\qquad x,\theta>0$$due to the normalisation factor $1/\theta$. However, and this is central to my answer that the question does not have a generic meaning, a change of variable from $(x,\theta)$ to $$(y,\xi)=(\log x,\log \theta)$$turns the scale family into a location family, for which the property holds.
This highlights that the major issue that property (1) heavily depends on the choice of the parameterisation $\theta$ of the distribution of $X$: likelihoods are invariant by reparameterisation and thus do not include a Jacobian for the change of variables, as probability densities do. Hence, if (1) holds for a parameterisation $\theta$ it will a.s. not hold for another parameterisation $\xi=h(\theta)$.
A new question is thus whether there could exist a parameterisation for which (1) happens but this is unlikely, especially when considering the dependence of $\ell(\theta|x)$ on the size of the sample $x$. For exponential families, this would essentially correspond to the existence of a function equal to its Laplace transform. Although this brings another difficulty with the phrasing of the question, which does not specify the dominating measures on $\mathcal{X}$ and $\Theta$: if those are free (and they should be in the case of $\Theta$ outside of a Bayesian setting) then there are more chances of a positive answer.

- 90,397
- 9
- 157
- 575