Is the likelihood in Bayes theorem a probability?

Question

First some notation and definitions: In the Bayes formula as written for machine learning applications, $$ p(\theta|D) = \frac{ p(D|\theta) p(\theta) }{ p(D) } $$ commonly $p(\theta)$ is labeled the prior, $p(D|\theta)$ is called the likelihood, and $p(D)$ is called the evidence (or marginal likelihood I think).

Elsewhere (in various books), the likelihood is defined as $$ p(D|\theta) $$ viewed as a function of $\theta$, not of $D$. But $\int p(D|\theta) d\theta$ does not necessarily integrate to one, and therefore is not a probability density. As support, in the Bishop Pattern Recognition & Machine Learing book p.22, "Note that the likelihood is not a probability distribution over w, and its integral with respect to w does not (necessarily) equal one."

On the other hand, I believe $\int p(D|\theta) d D$ does integrate to one. But this is the integral of a conditional probability, not a likelihood.

Finally the question: is the likelihood in the Bayes formula really a likelihood (since a likelihood is not a probability)? If not, why is it called a likelihood? If so, how can a non-probability appear in this formula that calculates the posterior probability?

How to think about this?

(In fact I have a theory about what the answer is, but since I am self learning I hope to hear a better and complete explanation rather than an answer "yea that's right", from which I would learn nothing)

score 0 · Answer 1 · answered Jan 03 '19 at 06:32

The likelihood is a so-called conditional density. It is a probability density function on the data space (for $D$) given any parameter $\theta$ that we pass over to the function. When integrating over it with respect to $D$ we obtain the conditional distribution of the data given a certain parameter. This conditional distribution is a Markov Kernel.

Maybe this helps: The fundamental Problem that is to be solved in any kind of parametric statistics is: There is some underlying parameter $\theta^*$. We do not know this parameter. We obtain a sample $D \sim p(\cdot|\theta^*)$. Given this sample $D$, we now want to identify $\theta^*$. Frequentist statistics uses estimators, and hypothesis tests. Bayesian statistics uses posterior measures. Hence the likelihood is the density from which the data is sampled.

One more confusion to mention: Note that a density function in a particular point does in general not refer to a „probability“ - a probability being a value between 0 and 1. The probability is what we obtain, when integrating over a density function with respect to the correct measure.

Is the likelihood in Bayes theorem a probability?

1 Answers1

Linked