7

Let $d\in\mathbb N$ and consider the Random Walk Metropolis-Hastings algorithm with a Gaussian proposal kernel $Q$ such that $Q(x,\;\cdot\;)=\mathcal N_d(x,\sigma^2_dI_d)$ for all $x\in\mathbb R^d$.

Intuitively, if $\sigma$ is too small, nearly all proposals will be accepted and the chain moves extremely slowly. On the other hand, if $\sigma$ is too large, the proposed moves will usually be far away from the current state and hence most proposals will be rejected.

With this in mind, it makes sense to model $\sigma_d$ as a decreasing function of $d$. We may set $\sigma_d=\ell/d^\alpha$ for some $\alpha\in[0,1]$. On page 6 (in the numbering of the paper) of his work, Roberts mentioned that the choice $\alpha=1/2$ is "optimal" (in which sense?).

How can we prove this rigorously?

Slide 18 of a presentation that I found, seems to be related, but I don't get how they conclude presentation

0xbadf00d
  • 539
  • 3
  • 8

1 Answers1

2

The original approach by Gareth Roberts et al. is to investigate the limiting distribution of the first coordinate process $X^{(1)}_n$, accelerated by a factor $d$. This leads to the limiting process $Z_t = X^{(1)}_{\lfloor t d \rfloor}$.

  • If you put $\alpha < 1/2$ (large steps), it can be shown that asymptotically none of the proposed moves are going to be accepted, so that the process $(Z_t)$ is constant almost surely.
  • Similarly, if $\alpha > 1/2$ (small steps), asymptotically all moves are going to be accepted but the moves are so small, so that again the limiting process $(Z_t)$ is constant almost surely.
  • Finally, only for $\alpha = 1/2$, we get a non-trivial limiting process (which happens to be a Langevin diffusion).

This is the way in which $\alpha = 1/2$ can be considered an optimal choice. For a more precise statement and proof of this result, consult the original research paper which is very readable.

Roberts, G. O., Gelman, A., & Gilks, W. R. (1997). Weak convergence and optimal scaling of random walk Metropolis algorithms. The Annals of Applied Probability, 7(1), 110–120. https://doi.org/10.1214/aoap/1034625254

The result is later extended in various ways: e.g. looking at different functions of the high-dimensional process (instead of the first coordinate), and more general distributional assumptions. Also different Metropolis-Hastings algorithms have been studied, for example the MALA algorithm, which was shown to require a time speed up of only $d^{1/3}$ instead of $d$ in order to converge. This is also discussed in the survey paper you are reading.

Joris Bierkens
  • 675
  • 4
  • 18
  • 1
    Thank you for your answer. I'm aware of the original paper. Unfortunately, there is no prove for your claims about $\alpha<1/2$ and $\alpha>1/2$. I've read multiple other papers of Roberts too. The claims are always mentioned, but there is never a proof or a reference for it. So, how can we prove that rigorously? (Please take note of this related question https://math.stackexchange.com/q/3140066/47771.) – 0xbadf00d Mar 15 '19 at 07:24
  • I agree it is not stated explicitly in the paper. It should be possible to show that taking $\alpha \neq 1/2$ yields a trivial limiting process $Z_t = \mathrm{const}$ almost surely. The key result in the mentioned paper is in Lemma 2.6 which needs to be adapted. Note that the computations are simplified since we may restrict $x^d$ to sets $F_d$, and also we may restrict attention to functions in the domain of the generator having compact support. I would recommend basing your efforts on a research article rather than presentation slides which necessarily leave out some of the detail. – Joris Bierkens Mar 15 '19 at 21:10
  • I've consulted plenty research papers. The claim can be found in many of them, but none of them contain even a sketch of the proof. The slides were the first and only source where the claim is considered in more detail. If you know a reference for a proof, please let me know. – 0xbadf00d Mar 15 '19 at 22:24