Bayesian importance sampling as an answer to a "paradox" by Wasserman

Question

Both in his book and on his blog, Larry Wasserman has discussed an example in which naive application of the Bayesian methods gives nonsensical answers.

Intro

The problem is to estimate the normalization $c$ of an un-normalized probability distribution: $g(x)$. The target value of $c$ is given by:

$$ c = \int g(x) $$

Pr. Wasserman shows that naive Bayesian estimation of $c$ gives stupid results. I'll let you check his blog. He offers as an open question the construction of a Bayesian estimator of c

This was discussed before on SE, and the answer was mostly that this is a silly example and that there is no reason to be doing Bayesian inference on this problem. But let me try anyway:

The sampling solution

To build a Bayesian estimator, let's first check out what conventional methods are available. One those method to estimate $c$ is rejection sampling: we generate samples from a second simple probability distribution $q(x)$. Noting those samples $x_k$, we then compute the empirical mean of the ratio: $\frac{g(x)}{q(x)}$ and obtain an unbiased estimator of $c$:

$$ \hat{c} = \frac{1}{n} \sum_{k=1}^n \frac{g(x_k)}{q(x_k)} $$

Bayesian importance sampling ?

Now, it might be stupid, but why couldn't we, at least in theory, construct a Bayesian estimator that takes as data the sequence $ \frac{g(x_k)}{q(x_k)} $ and tries to build a posterior of $c$ given those ?

For example, if we happen to be in a case where we known that the ratio $ \frac{g(x)}{q(x)} $ is bounded, we could model the observations as resulting from a beta distribution and do the conjugate inference. If we do not have an upper bound, we might model the observations as Gamma instead. We might even use likelihoods with no-conjugate priors and/or complicated priors.

My questions are:

Is this idea of doing Bayesian importance sampling something that has already been analyzed?
Does this, for some reason or another, fail to solve this problem of inferring the normalizing constant?

What is particularly "Bayesian" about rejection sampling? I would instead understand a "Bayesian" solution to be one that relies fundamentally on a prior for $c$, but I don't see any reference to such a thing anywhere in this question. Wasserman's blog does mention a prior distribution, but it's a different object--it's a prior for the parameter $\theta$. At the end he briefly discusses the possibility of a prior for $g$ and immediately dismisses it, but why should one require such a rich, complex object when the target of estimation is only $c$? — whuber, Jul 27 '15 at 15:39
There is nothing bayesian in conventional importance sampling which is why i m thinking of making a bayesian variant, in much the same way that you can do bayesian variants of regression for example — Guillaume Dehaene, Jul 28 '15 at 06:53
For inference on $c$ the issue in wasserman example is that you cant do inference on the ratio $ p(x) / g(x) $ because only one value would work. But you can do inference on the integral $ \int g(x) $ i think, which is what i m trying to do — Guillaume Dehaene, Jul 28 '15 at 06:57

score 6 · Accepted Answer · answered Nov 23 '15 at 13:20

There exists a Bayesian approach to the numerical resolution of integrals and differential equations: it is called probabilistic numerics. Introducing a Gaussian process prior on $g$ leads to a posterior probability on $g$ itself and on integrals depending on $g$ once observations $g(x_1),\ldots,g(x_p)$ have been made.

Philipp Hennig and Michael Osborne have created a webpage on this approach, where you can find references and links.

Bayesian importance sampling as an answer to a "paradox" by Wasserman

Intro

The sampling solution

Bayesian importance sampling ?

1 Answers1