Combining transition operators in MCMC

Question

Many MCMC papers usually present a new single transition operator (or a family thereof) such as different proposals for Metropolis-Hastings, new forms of slice sampling, etc. I am interested in derivative-free methods; no Hamiltonian Monte Carlo or variants thereof.

The rule of thumb I heard of is that often you get improvement in mixing of the chains by combining several transition operators. The choice of a good set of operators is a problem of discrete and continuous hyper-parameter tuning which is often done off-line (e.g., at the end of burn-in, or from preliminary runs). You could also use some online adaptive methods, if you take great care (see this post), but that's another point.

Is there any work (article, thesis, book) that specifically analyses performance of MCMC with combinations of transition operators, and/or discusses how to pick a good set? (beyond the fact that the set needs to produce and ergodic operator) I am interested both in the single-state and multi-state(*) cases.

(*) I am using the terminology of Section 30.6 of MacKay's book:

In a multi-state method, multiple parameter vectors $\textbf{x}$ are maintained; they evolve individually under moves such as Metropolis and Gibbs; there are also interactions among the vectors.

I am using the terminology of Section 30.6 of MacKay's book: "In a multi-state method, multiple parameter vectors $\textbf{x}$ are maintained; they evolve individually under moves such as Metropolis and Gibbs; there are also interactions among the vectors". Added the explanation to the question. — lacerbi, Mar 30 '16 at 11:58
Thank you. I would not consider this multi-state method as a standard MCMC method. When considering interacting vectors, the entire vector has to be considered as a whole, which increases $n$-fold the dimension of the target and may thus significantly impact convergence. — Xi'an, Mar 30 '16 at 12:48
It might be non-standard for machine learners, but a very popular method among astronomers is [Emcee](http://arxiv.org/abs/1202.3665), which is a multi-state method. I am not part of their community, but it got me curious towards multi-state methods in general, that's why I mentioned it (although discussing efficiency of multi-state methods is another question). — lacerbi, Mar 30 '16 at 14:24
@Xi'an: Partially motivated by your remark, I asked a separate question [here](https://stats.stackexchange.com/questions/204559/performance-benchmarks-for-mcmc). — lacerbi, Mar 30 '16 at 15:13
Here is [a set of comments](https://xianblog.wordpress.com/2014/01/08/mcmski-day-2/) I wrote about Emcee two years ago. — Xi'an, Mar 30 '16 at 15:17
@Xi'an: Thanks. Both your post and Bob Carpenter's reply to it were useful. Of course he mentions that "it'd be nice" to perform a benchmark comparison between single-state and Goodman and Weave's method -- how come that it is not the *standard* to perform benchmarks in the MCMC community? (see my linked question) — lacerbi, Mar 30 '16 at 15:27

Xi'an · Answer 1 · 2016-03-30T12:48:18.440

A combination of MCMC proposals can only improve upon each of the Markov operators used, if you do not take computing time into account. There is for instance an early result by Tierney (1994) about the benefits of mixing two MCMC kernels. (One can also argue that Gibbs sampling is nothing but a combination of kernels. While each of those kernels is not even irreducible, since it only generates a subset of the variables, the combination is a valid MCMC generator. Furthermore, it can be shown that a random combination does better than a sequential generation of $\theta_1$, then $\theta_2$, etc.)

Combining transition operators in MCMC

1 Answers1

Linked