Why are there so many MCMC variants?

Question

I'm looking at Wikipedia and there is like a thousand different MCMC versions. Why is this so? Why is there a different one for each application?

Are there different priors encoded into the MCMC?

My Answer is generic with a few specific examples and some links that may be useful. I hope others will suggest relevant links. — BruceET, May 10 '20 at 04:20

BruceET · Answer 1 · 2020-05-10T04:34:19.547

It is sometimes useful to simulate various kinds of probability models, especially when direct mathematical analysis is difficult or involves computational difficulties.

Example 1. Suppose a manufacturing process requires two steps in sequence: The first takes a length of time $X$ that is distributed normally with mean 20 and SD 2 hours and, independently; the second $Y$ is exponentially distributed with mean 5 hours.

it is easy to see that the mean time $T = X+Y$ has $E(T) = 25,$ $SD(T) = \sqrt{4 + 25} = 5.39.$ But it is not quite so easy to find $P(T \le 30).$ A simple Monte Carlo (simulation) process approximates this probability as $0.85314 \pm 0.00071.$

set.seed(509)
x = rnorm(10^6, 20, 2);  y = rexp(10^6, 1/5)
t = x+y;  mean(t);  sd(t)
[1] 25.00234     # aprx E(T) = 25
[1] 5.396157     # aprx SD(T) = 5.39
mean(t <= 30);  2*sd(t <= 30)/1000
[1] 0.85314      # aprx P(T <= 30) = 0.853
[1] 0.000707933  # aprx 95% margin of sim error

[Without simulation, some textbooks might suggest using a normal approximation, but $T$ is hardly normal, and a normal approximation gives $P(T \le 30) \approx 0.8232.]$

Example 2: If process is an M/M/1 queue with arrivals at rate $\lambda = 3$ and service at rate $\mu = 4$ per hour, then there are standard formulas for the average number of people in the system $[\frac{\lambda}{\mu-\lambda} = 3],$ the proportion of time the server is busy $[\lambda/\mu= 3/4],$ and so on. But if you are 4th in line to be served, simulation may be the easiest way to find the probability you will be out of the system in less than an hour. That simulation is Monte Carlo investigation of a Markov process, but it would not ordinarily be called 'MCMC'.

Often the terminology MCMC is used for simulations to solve problems in Bayesian inference. A convergent Markov process is contrived so that its limiting distribution gives you information about the posterior distribution to answer the problem at hand. It is often not possible to give a simple 'formula' for the Markov process, even if it is possible to see how to move from one step to the next as the process evolves. By simulating the Markov chain through many steps, you can approximate its limiting distribution and thus the desired posterior distribution.

Example 3: The elementary Gibbs sampler of this Q&A shows how disease prevalence can be approximated from screening test data. Other approaches to solving such problems in Bayesian inference might simulate the posterior distribution using the Metropolis Hastings algorithm or some other MCMC method.

The nature of the univariate or multivariate distribution being simulated may dictate the best choice what MCMC method to use. Availability of suitable software programs and personal experience with various methods and programs may also play a role.

Very interesting comment Bruce. So, you're saying that personal experience dictating which MCMC to use is important. So, you need to know the nature of the probability distribution to select MCMC, is that right? — asdfklmadsklmas, May 10 '20 at 04:48
What other algorithm families have a similar setup where personal experience plays a role in selecting the particular algorithm instance? I'm personally very interested in this topic. — asdfklmadsklmas, May 10 '20 at 04:51

Why are there so many MCMC variants?

1 Answers1