First some notation and definitions: In the Bayes formula as written for machine learning applications, $$ p(\theta|D) = \frac{ p(D|\theta) p(\theta) }{ p(D) } $$ commonly $p(\theta)$ is labeled the prior, $p(D|\theta)$ is called the likelihood, and $p(D)$ is called the evidence (or marginal likelihood I think).
Elsewhere (in various books), the likelihood is defined as $$ p(D|\theta) $$ viewed as a function of $\theta$, not of $D$. But $\int p(D|\theta) d\theta$ does not necessarily integrate to one, and therefore is not a probability density. As support, in the Bishop Pattern Recognition & Machine Learing book p.22, "Note that the likelihood is not a probability distribution over w, and its integral with respect to w does not (necessarily) equal one."
On the other hand, I believe $\int p(D|\theta) d D$ does integrate to one. But this is the integral of a conditional probability, not a likelihood.
Finally the question: is the likelihood in the Bayes formula really a likelihood (since a likelihood is not a probability)? If not, why is it called a likelihood? If so, how can a non-probability appear in this formula that calculates the posterior probability?
How to think about this?
(In fact I have a theory about what the answer is, but since I am self learning I hope to hear a better and complete explanation rather than an answer "yea that's right", from which I would learn nothing)