2

Suppose $L(X \mid \theta)$ is a likelihood function, i.e., a probability distribution over $X \in \mathcal{X}$ indexed by a parameter $\theta \in \Theta$. Suppose further we have a prior $\pi(\theta)$, with $\int_{\Theta} \pi(\theta) \, d\theta = 1$, such that we can compute a posterior $p(\theta \mid X) \propto L(X \mid \theta)\pi(\theta)$. Is it always true that $ -\infty < E_p[\log(L(X \mid \theta))] < \infty$?

As comments indicate, people seem to be skeptical of the claim in general (so am I). To get the ball rolling, let us get some bounds. First, let us state the usual bounds on $\log(y)$: $$ \left(1 - \frac{1}{y}\right) \leq \log(y) \leq (y-1). $$ Let us first study the upper bound. Let $Y = L( X \mid \theta)$. By Jensen's inequality, we have $$ E_p[\log(Y)] \leq \log(E_p[Y]) = \log\left( \frac{1}{Z} \int_{\Theta} L(X \mid \theta)^2 \pi(\theta) \, d\theta \right) < \infty,$$ following this answer on CV, which is mine so I hope it's correct. Now, for the lower bound, it seems to me we need that $E_p[1/Y] < \infty$ which is true since it is just $\frac{1}{Z}\int_\Theta \pi(\theta) \, d\theta = 1/Z$, where $Z$ is the normalising constant to the posterior. So I guess that if my answer on that other thread is correct, and a tempered likelihood $L(X \mid \theta)^\tau$ with a finite tempering $\tau > 0$ leads to a proper (pseudo) posterior, then we're done.

  • 2
    I'm not sure that's right. Log likelihoods often to go minus infinity as $\theta$ tends to plus or minus infinity. I guess it depends what happens to $\pi(\theta)$ as $\theta$ changes? For improper uniform prior that won't work, maybe for a prior with finite support the answer is always yes and for others it depends how it approaches 0 as $\theta$ changes?? I really don't have the background to say more – ASeaton Jul 29 '19 at 16:16
  • Thanks for your comment. I'll edit to add that I'm interested in **proper** priors. – Luiz Max Carvalho Jul 29 '19 at 18:10
  • 2
    I am pretty convinced that there are cases in which this expression is not even well defined because the thing we want to integrate over is not L1... this is related: https://stats.stackexchange.com/questions/275753/em-and-gaussian-mixture-models – Fabian Werner Jul 29 '19 at 21:52
  • 1
    According to [this answer of mine](https://stats.stackexchange.com/questions/188903/intuition-on-the-kullback-leibler-kl-divergence/189758#189758), the expectation of the log likelihood ratio under the alternative hypothesis is the Kullback-Leibler divergence, which indeed can be infinite! so that should answer the question, in the case of a **point posterior**. In other cases you ask about a **mixture of KL divergences**. – kjetil b halvorsen Jul 30 '19 at 11:24

2 Answers2

1

In the case of a point posterior,you ask about a KL divergence!

According to this answer of mine, the expectation of the log likelihood ratio under the alternative hypothesis is the Kullback-Leibler divergence, which indeed can be infinite! so that should answer the question, in the case of a point posterior. In other cases you ask about a mixture of KL divergences.

Since the mixture is over something that can be infinite, that will apply also the the mixture.

kjetil b halvorsen
  • 63,378
  • 26
  • 142
  • 467
0

I am going to add this, answering in the affirmative, and would be grateful if someone pointed out if and where it is wrong.

First, let us establish which probability distribution we'll be taking expectations with respect to: $$ p(\theta \mid X) = \frac{L(X \mid \theta)\pi(\theta)}{Z_\pi}\ $$ where we drop the dependency of $Z_\pi$ on $X$ for ease of notation. We then say that $$E_p[f] = \int_{\Theta} p(\theta \mid X) f(\theta)\, d\theta.$$ cf. LOTUS.

The claim is $ -\infty < E_p[\log(L(X \mid \theta))] < \infty$. Letting $ Y = L(X \mid \theta)$ and using the bounds on $\log(y)$: \begin{equation} \label{eq:bounds} \left(1 -\frac{1}{y}\right) \leq \log(y) \leq y-1, \end{equation} plus the fact that the expectation operator is linear, we get $$ 1 - E_p[1/Y]\leq E_p[\log(Y)] \leq E_p[Y] - 1. $$ Hence we have to show that (i) $ -\infty < E_p[1/Y] < \infty$; and (ii) $E_p[Y] < \infty$. Since the prior is proper, we have $$ E_p[1/Y] = \int_{\Theta} \frac{L(X \mid \theta)\pi(\theta)}{Z_\pi} \frac{1}{L(X \mid \theta)} \, d\theta = \frac{1}{Z_\pi}.$$ For (ii), I'll reproduce a simplified version of the arguments on a previous thread, which I also linked above. First, notice that we could denote the posterior as $p(\theta)$ and call it a proper prior for $\theta$. And we know that $$ Z_p = \int_{\Theta} L(X\mid \theta) p(\theta)\, d\theta < \infty, $$ which is exactly $E_p[Y]$.