Expectation of the log-likelihood under the posterior

Question

Suppose $L(X \mid \theta)$ is a likelihood function, i.e., a probability distribution over $X \in \mathcal{X}$ indexed by a parameter $\theta \in \Theta$. Suppose further we have a prior $\pi(\theta)$, with $\int_{\Theta} \pi(\theta) \, d\theta = 1$, such that we can compute a posterior $p(\theta \mid X) \propto L(X \mid \theta)\pi(\theta)$. Is it always true that $ -\infty < E_p[\log(L(X \mid \theta))] < \infty$?

As comments indicate, people seem to be skeptical of the claim in general (so am I). To get the ball rolling, let us get some bounds. First, let us state the usual bounds on $\log(y)$: $$ \left(1 - \frac{1}{y}\right) \leq \log(y) \leq (y-1). $$ Let us first study the upper bound. Let $Y = L( X \mid \theta)$. By Jensen's inequality, we have $$ E_p[\log(Y)] \leq \log(E_p[Y]) = \log\left( \frac{1}{Z} \int_{\Theta} L(X \mid \theta)^2 \pi(\theta) \, d\theta \right) < \infty,$$ following this answer on CV, which is mine so I hope it's correct. Now, for the lower bound, it seems to me we need that $E_p[1/Y] < \infty$ which is true since it is just $\frac{1}{Z}\int_\Theta \pi(\theta) \, d\theta = 1/Z$, where $Z$ is the normalising constant to the posterior. So I guess that if my answer on that other thread is correct, and a tempered likelihood $L(X \mid \theta)^\tau$ with a finite tempering $\tau > 0$ leads to a proper (pseudo) posterior, then we're done.

I'm not sure that's right. Log likelihoods often to go minus infinity as $\theta$ tends to plus or minus infinity. I guess it depends what happens to $\pi(\theta)$ as $\theta$ changes? For improper uniform prior that won't work, maybe for a prior with finite support the answer is always yes and for others it depends how it approaches 0 as $\theta$ changes?? I really don't have the background to say more — ASeaton, Jul 29 '19 at 16:16
Thanks for your comment. I'll edit to add that I'm interested in **proper** priors. — Luiz Max Carvalho, Jul 29 '19 at 18:10
I am pretty convinced that there are cases in which this expression is not even well defined because the thing we want to integrate over is not L1... this is related: https://stats.stackexchange.com/questions/275753/em-and-gaussian-mixture-models — Fabian Werner, Jul 29 '19 at 21:52
According to [this answer of mine](https://stats.stackexchange.com/questions/188903/intuition-on-the-kullback-leibler-kl-divergence/189758#189758), the expectation of the log likelihood ratio under the alternative hypothesis is the Kullback-Leibler divergence, which indeed can be infinite! so that should answer the question, in the case of a **point posterior**. In other cases you ask about a **mixture of KL divergences**. — kjetil b halvorsen, Jul 30 '19 at 11:24

score 1 · Answer 1 · answered Jul 30 '19 at 11:29

1

In the case of a point posterior,you ask about a KL divergence!

According to this answer of mine, the expectation of the log likelihood ratio under the alternative hypothesis is the Kullback-Leibler divergence, which indeed can be infinite! so that should answer the question, in the case of a point posterior. In other cases you ask about a mixture of KL divergences.

Since the mixture is over something that can be infinite, that will apply also the the mixture.

answered Jul 30 '19 at 11:29

kjetil b halvorsen

63,378
26
142
467

Can you please clarify what a point posterior is? Please excuse my ignorance. – Luiz Max Carvalho Jul 30 '19 at 11:58
A posterior concentrated at one point – kjetil b halvorsen Jul 30 '19 at 13:42
I'm not sure the last part of your argument follows, necessarily. That would imply the proto argument I present in my question is wrong somehow. Can you spot what's wrong with it? – Luiz Max Carvalho Jul 30 '19 at 13:57
1

OK, I read your question to fast, it is about log likelihood **NOT** log likelihood ratio, and the expectation is over the parameter space, so the relationship with KL is not there, at least not so directly. Will come back. – kjetil b halvorsen Jul 30 '19 at 19:50
See if you can spot anything amiss in my answer below. I think the answer yes, somewhat surprisingly. – Luiz Max Carvalho Aug 07 '19 at 03:28

Luiz Max Carvalho · Accepted Answer · 2019-11-04T21:12:48.140

I am going to add this, answering in the affirmative, and would be grateful if someone pointed out if and where it is wrong.

First, let us establish which probability distribution we'll be taking expectations with respect to: $$ p(\theta \mid X) = \frac{L(X \mid \theta)\pi(\theta)}{Z_\pi}\ $$ where we drop the dependency of $Z_\pi$ on $X$ for ease of notation. We then say that $$E_p[f] = \int_{\Theta} p(\theta \mid X) f(\theta)\, d\theta.$$ cf. LOTUS.

The claim is $ -\infty < E_p[\log(L(X \mid \theta))] < \infty$. Letting $ Y = L(X \mid \theta)$ and using the bounds on $\log(y)$: \begin{equation} \label{eq:bounds} \left(1 -\frac{1}{y}\right) \leq \log(y) \leq y-1, \end{equation} plus the fact that the expectation operator is linear, we get $$ 1 - E_p[1/Y]\leq E_p[\log(Y)] \leq E_p[Y] - 1. $$ Hence we have to show that (i) $ -\infty < E_p[1/Y] < \infty$; and (ii) $E_p[Y] < \infty$. Since the prior is proper, we have $$ E_p[1/Y] = \int_{\Theta} \frac{L(X \mid \theta)\pi(\theta)}{Z_\pi} \frac{1}{L(X \mid \theta)} \, d\theta = \frac{1}{Z_\pi}.$$ For (ii), I'll reproduce a simplified version of the arguments on a previous thread, which I also linked above. First, notice that we could denote the posterior as $p(\theta)$ and call it a proper prior for $\theta$. And we know that $$ Z_p = \int_{\Theta} L(X\mid \theta) p(\theta)\, d\theta < \infty, $$ which is exactly $E_p[Y]$.

Expectation of the log-likelihood under the posterior

2 Answers2