How can we verify the intuition that in the RW-Metropolis-Hastings algorithm with Gaussian proposal too small and too large variances are bad choices

Question

Let $d\in\mathbb N$ and consider the Random Walk Metropolis-Hastings algorithm with a Gaussian proposal kernel $Q$ such that $Q(x,\;\cdot\;)=\mathcal N_d(x,\sigma^2_dI_d)$ for all $x\in\mathbb R^d$.

Intuitively, if $\sigma$ is too small, nearly all proposals will be accepted and the chain moves extremely slowly. On the other hand, if $\sigma$ is too large, the proposed moves will usually be far away from the current state and hence most proposals will be rejected.

With this in mind, it makes sense to model $\sigma_d$ as a decreasing function of $d$. We may set $\sigma_d=\ell/d^\alpha$ for some $\alpha\in[0,1]$. On page 6 (in the numbering of the paper) of his work, Roberts mentioned that the choice $\alpha=1/2$ is "optimal" (in which sense?).

How can we prove this rigorously?

Slide 18 of a presentation that I found, seems to be related, but I don't get how they conclude

score 2 · Answer 1 · answered Mar 13 '19 at 13:51

The original approach by Gareth Roberts et al. is to investigate the limiting distribution of the first coordinate process $X^{(1)}_n$, accelerated by a factor $d$. This leads to the limiting process $Z_t = X^{(1)}_{\lfloor t d \rfloor}$.

If you put $\alpha < 1/2$ (large steps), it can be shown that asymptotically none of the proposed moves are going to be accepted, so that the process $(Z_t)$ is constant almost surely.
Similarly, if $\alpha > 1/2$ (small steps), asymptotically all moves are going to be accepted but the moves are so small, so that again the limiting process $(Z_t)$ is constant almost surely.
Finally, only for $\alpha = 1/2$, we get a non-trivial limiting process (which happens to be a Langevin diffusion).

This is the way in which $\alpha = 1/2$ can be considered an optimal choice. For a more precise statement and proof of this result, consult the original research paper which is very readable.

Roberts, G. O., Gelman, A., & Gilks, W. R. (1997). Weak convergence and optimal scaling of random walk Metropolis algorithms. The Annals of Applied Probability, 7(1), 110–120. https://doi.org/10.1214/aoap/1034625254

The result is later extended in various ways: e.g. looking at different functions of the high-dimensional process (instead of the first coordinate), and more general distributional assumptions. Also different Metropolis-Hastings algorithms have been studied, for example the MALA algorithm, which was shown to require a time speed up of only $d^{1/3}$ instead of $d$ in order to converge. This is also discussed in the survey paper you are reading.

Thank you for your answer. I'm aware of the original paper. Unfortunately, there is no prove for your claims about $\alpha<1/2$ and $\alpha>1/2$. I've read multiple other papers of Roberts too. The claims are always mentioned, but there is never a proof or a reference for it. So, how can we prove that rigorously? (Please take note of this related question https://math.stackexchange.com/q/3140066/47771.) — 0xbadf00d, Mar 15 '19 at 07:24
I agree it is not stated explicitly in the paper. It should be possible to show that taking $\alpha \neq 1/2$ yields a trivial limiting process $Z_t = \mathrm{const}$ almost surely. The key result in the mentioned paper is in Lemma 2.6 which needs to be adapted. Note that the computations are simplified since we may restrict $x^d$ to sets $F_d$, and also we may restrict attention to functions in the domain of the generator having compact support. I would recommend basing your efforts on a research article rather than presentation slides which necessarily leave out some of the detail. — Joris Bierkens, Mar 15 '19 at 21:10
I've consulted plenty research papers. The claim can be found in many of them, but none of them contain even a sketch of the proof. The slides were the first and only source where the claim is considered in more detail. If you know a reference for a proof, please let me know. — 0xbadf00d, Mar 15 '19 at 22:24

How can we verify the intuition that in the RW-Metropolis-Hastings algorithm with Gaussian proposal too small and too large variances are bad choices

1 Answers1

Linked