Using all Metropolis-Hastings proposals to estimate an integral

Question

Suppose we run the Metropolis-Hastings with target distribution $\mu$ to compute the integral $\int f\:{\rm d}\mu$. Usually, we use the estimator $$A_n:=\frac1n\sum_{i=0}^{n-1}f(X_i).$$ However, instead of adding up the $f(X_i)$, I've seen the following modification: Let $(Y_n)_n$ denote the sequence of proposals. By definition, $Y_i$ is accepted and $X_i$ is set to $Y_i$ with probability $\alpha(X_{i-1},X_i)$. Instead of adding $f(X_i)$ to the sum, we could add $(1-\alpha(X_{i-1},Y_i))f(X_{i-1})+\alpha(X_{i-1},Y_i)f(Y_i)$.

I couldn't find any lecture book or paper considering this modification or an estimator modified in a similar way (except this paper). Does anybody have a reference at hand?

Xi'an · Accepted Answer · 2019-10-15T18:42:58.813

6

Recycling proposed values in a Metropolis-Hastings algorithm goes under the name of Rao-Blackwellisation. For instance, we made such a proposal in

Casella and Robert (1996) Rao-Blackwellisation of sampling schemes.
Douc and Robert (2013) A vanilla Rao-Blackwellisation of Metropolis-Hastings algorithms

Note that the weighting you propose in the question: $$[1-\alpha(x_i,y_{i+1})]h(x_i)+\alpha(x_i,y_{i+1})h(y_i)$$ is not optimal in that it is not guaranteed to bring the variance down. Even the one I produce [above] in one of my MCMC course slides does not always reduce the variance because of the correlation between the different terms. (Only the data augmentation case leads to a sure reduction in the variance, cf this fantastic paper by Liu, Wong and Long, 1994. And vanilla Rao-Blackwellisation.)

edited Oct 15 '19 at 18:42

answered Oct 15 '19 at 05:23

Xi'an

90,397
9
157
575

1

Thank you for your answer. I'll check the papers and accept your answer soon. Just one quick question: Is the process $(X_{n-1},Y_n)_{n\in\mathbb N}$ again a Markov process? If I'm not missing anything, this should be the case. – 0xbadf00d Oct 15 '19 at 18:21
Yes indeed, the joint chain is a Markov chain, either $(_{−1},_)_{∈ℕ}$ or $(_{},_)_{∈ℕ}$. – Xi'an Oct 15 '19 at 18:44
Can we give an expression for the transition kernel? Assuming that $(X_{n-1},Y_n)\sim\mathcal L(X_{n-1})\otimes Q$, where $Q$ is the proposal kernel? – 0xbadf00d Oct 15 '19 at 18:46
The usual transition on $X_{n-1}$ given $Y_{n-1}$ and $X_{n-2}$ times the proposal. – Xi'an Oct 15 '19 at 19:25
Took me a while to determine the transition kernel, but now I've obtained $$\operatorname P\left[(X_n,Y_{n+1})\in A_n\times B_{n+1}\mid(X_{n-1},Y_n)\right]=\delta_{Y_n}\alpha(X_{n-1},Y_n)Q(Y_n,B_{n+1})+\delta_{X_{n-1}}(A_n)(1-\alpha(X_{n-1},Y_n))Q(X_{n-1},B_{n+1}).$$ Is that what you've got in mind? – 0xbadf00d Oct 19 '19 at 16:10

score 4 · Answer 2 · answered Oct 15 '19 at 04:24

This looks similar to "local" importance sampling. In the literature, constructing estimators of this sort seem to termed as "waste-recyling", and quick search yields a few papers:

This thesis by Murray
PNAS paper by Freknel here
Does waste-recyling help by Delmas and Jourdain here
Their use in parallel MCMC here

Further search using the keyword "waste-recyling" and look within the above references might give you some more papers.

Using all Metropolis-Hastings proposals to estimate an integral

2 Answers2