7

Suppose we run the Metropolis-Hastings with target distribution $\mu$ to compute the integral $\int f\:{\rm d}\mu$. Usually, we use the estimator $$A_n:=\frac1n\sum_{i=0}^{n-1}f(X_i).$$ However, instead of adding up the $f(X_i)$, I've seen the following modification: Let $(Y_n)_n$ denote the sequence of proposals. By definition, $Y_i$ is accepted and $X_i$ is set to $Y_i$ with probability $\alpha(X_{i-1},X_i)$. Instead of adding $f(X_i)$ to the sum, we could add $(1-\alpha(X_{i-1},Y_i))f(X_{i-1})+\alpha(X_{i-1},Y_i)f(Y_i)$.

I couldn't find any lecture book or paper considering this modification or an estimator modified in a similar way (except this paper). Does anybody have a reference at hand?

Shayan Shafiq
  • 633
  • 6
  • 17
0xbadf00d
  • 539
  • 3
  • 8

2 Answers2

6

enter image description hereRecycling proposed values in a Metropolis-Hastings algorithm goes under the name of Rao-Blackwellisation. For instance, we made such a proposal in

Note that the weighting you propose in the question: $$[1-\alpha(x_i,y_{i+1})]h(x_i)+\alpha(x_i,y_{i+1})h(y_i)$$ is not optimal in that it is not guaranteed to bring the variance down. Even the one I produce [above] in one of my MCMC course slides does not always reduce the variance because of the correlation between the different terms. (Only the data augmentation case leads to a sure reduction in the variance, cf this fantastic paper by Liu, Wong and Long, 1994. And vanilla Rao-Blackwellisation.)

Xi'an
  • 90,397
  • 9
  • 157
  • 575
  • 1
    Thank you for your answer. I'll check the papers and accept your answer soon. Just one quick question: Is the process $(X_{n-1},Y_n)_{n\in\mathbb N}$ again a Markov process? If I'm not missing anything, this should be the case. – 0xbadf00d Oct 15 '19 at 18:21
  • Yes indeed, the joint chain is a Markov chain, either $(_{−1},_)_{∈ℕ}$ or $(_{},_)_{∈ℕ}$. – Xi'an Oct 15 '19 at 18:44
  • Can we give an expression for the transition kernel? Assuming that $(X_{n-1},Y_n)\sim\mathcal L(X_{n-1})\otimes Q$, where $Q$ is the proposal kernel? – 0xbadf00d Oct 15 '19 at 18:46
  • The usual transition on $X_{n-1}$ given $Y_{n-1}$ and $X_{n-2}$ times the proposal. – Xi'an Oct 15 '19 at 19:25
  • Took me a while to determine the transition kernel, but now I've obtained $$\operatorname P\left[(X_n,Y_{n+1})\in A_n\times B_{n+1}\mid(X_{n-1},Y_n)\right]=\delta_{Y_n}\alpha(X_{n-1},Y_n)Q(Y_n,B_{n+1})+\delta_{X_{n-1}}(A_n)(1-\alpha(X_{n-1},Y_n))Q(X_{n-1},B_{n+1}).$$ Is that what you've got in mind? – 0xbadf00d Oct 19 '19 at 16:10
4

This looks similar to "local" importance sampling. In the literature, constructing estimators of this sort seem to termed as "waste-recyling", and quick search yields a few papers:

Further search using the keyword "waste-recyling" and look within the above references might give you some more papers.

Greenparker
  • 14,131
  • 3
  • 36
  • 80