2

Importance Sampling is a method use to approximate expectations of a test function $\phi$ with respect to $p$ by instead sampling from a proposal distribution $q$ $$ \mathbb{E}_{p}[\phi(x)] = \int \phi(x) p(x) dx = \frac{Z_q}{Z_p}\int\phi(x) \frac{\tilde{p}(x)}{\tilde{q}(x)} q(x)dx = \frac{\displaystyle \int \phi(x) w(x) q(x)dx}{\displaystyle \int w(x) q(x) dx} \approx \sum_{i=1}^n \phi(x^{(i)})W(x^{(i)}) \qquad x^{(i)}\sim q $$

The normalizing constant is approximated with the following unbiased estimator $$ Z_p = \int \tilde{p}(x) dx = \frac{1}{n}\sum_{i=1}^n w(x^{(i)}) $$

Is it possible to use $q(x) = p(x)$? I would like to do so in order to approximate the normalizing constant.

This doesn't seem to work because then $w(x^{(i)}) = 1$ and we would have $$ Z_p \approx 1 $$

Update

Sorry, I forgot to mention this. I defined $$ W(x) = \frac{w(x^{(i)})}{\sum_{i=1}^n w(x^{(i)})} $$ so that $$ \mathbb{E}_p[\phi(x)] \approx \frac{\frac{1}{n}\sum_{i=1}^n \phi(x^{(i)}) w(x^{(i)})}{\frac{1}{n}\sum_{i=1}^n w(x^{(i)})} = \displaystyle \sum_{i=1}^n \phi(x^{(i)})\frac{w(x^{(i)})}{\sum_{j=1}^n w(x^{(j)})} = \sum_{i=1}^n \phi(x^{(i)}) W(x^{(i)}) $$

1 Answers1

1

If $p(x)=\tilde p(x)/Z_p$, computing $Z_p$ using uniquely a sample from $p$ is not feasible by importance sampling: $$\int \tilde p(x) \,\text dx=\int \frac{\tilde p(x)}{p(x)} p(x)\,\text dx=Z_p\int p(x)\,\text dx= Z_p$$ does not help since the importance weight ${\tilde p(x)}\big/{p(x)}$ is equal to $Z_p$, unknown!

When looking at either $Z_p$ or at the inverse of $Z_p$, the impossibility of finding unbiased estimators is clear: if $$\mathbb E_{p}[\ell(X)]={Z_p}\quad \text{and}\quad \mathbb E_{p}[h(X)]=\frac{1}{Z_p}$$ then \begin{align}\int \ell(x) p(x)\,\text dx&=\int\ell(x) \frac{\tilde p(x)}{Z_p}\,\text dx= {Z_p}\\ \int h(x) p(x)\,\text dx&=\int h(x) \frac{\tilde p(x)}{Z_p}\,\text dx= \frac{1}{Z_p}\end{align} implies that $$\int \ell(x) \tilde p(x)\,\text dx=Z_p^2\quad \text{and}\quad\int h(x) \tilde p(x)\,\text dx=1$$ both of which cannot hold since $\tilde p(\cdot)$ is known up to a constant.

This difficulty is discussed for MCMC samples in a 2008/9 BA paper I wrote with Jean-Michel Marin. And solutions are compared in this 2009/10 survey of ours. All based on additional samples$^1$ from different distributions (except for the infamous harmonic mean estimator!). See also the solutions based on reverse logistic regression (Geyer, 1991/4), noise contrastive estimation (Guttmann and Hyvärinen, 2010/2), sequential Monte Carlo, path sampling, &tc. (Also discussed in this X validated question.)


$^1$The closest to an exception may be Chib's method (or the candidate formula, cf. Besag) since the representation of the marginal is based on a single sample from the posterior, but this is an augmented posterior based on latent variables.

Xi'an
  • 90,397
  • 9
  • 157
  • 575