Questions tagged [markov-chain-montecarlo]

Markov Chain Monte Carlo (MCMC) refers to a class of simulation methods for generating samples from a complex target distribution by generating random numbers from a Markov Chain whose stationary distribution is the target distribution. MCMC methods are typically used when more direct methods for random number generation (e.g. inversion method) are infeasible. The very first MCMC method was the Metropolis (et al.) algorithm, later expanded by Hastings.

The principle behind Markov Chain Monte Carlo (MCMC) methods is that by definition an ergodic recurrent Markov chain $(x_t)_t$ converges in distribution to its stationary distribution no matter what the starting value $x_0$ is. If one can build an ergodic recurrent Markov chain with a given stationary distribution, $\pi$ say, this property can be turned into a simulation principle.

As it happens, there exists a universal algorithm that achieves this goal: it is the Metropolis-Hastings algorithm. Given a target distribution with density $\pi$ (known up to a normalisation constant) and an arbitrary conditional distribution with density $q(y|x)$ on the same space, the Metropolis-Hastings algorithm builds a Markov chain by starting at an arbitrary value $x_0$ (this is the appeal of ergodicity) and by moving from $x_t$ to $x_{t+1}$ according to the rule:

  1. generate a value $y_{t+1}\sim q(y|x_t)$
  2. compute $$ \rho(x_t,y_{t+1})= \text{min} (1, \pi(y_{t+1})q(x_t|y_{t+1})\big/ \pi(x_t)q(y_{t+1}|x_{t}) ) $$
  3. set $$ x_{t+1} = \begin{cases} x_t &\text{with probability }1-\rho(x_t,y_{t+1})\cr y_t &\text{with probability }\rho(x_t,y_{t+1})\cr \end{cases} $$

The validation of this algorithm follows from $\pi$ being stationary and the chain being irreducible if $q$ is everywhere positive.

Another example of MCMC algorithm is the Gibbs sampler. Given a joint distribution $\pi(\theta_1,\ldots,\theta_p)$, moving from $\theta^t$ to $\theta^{t+1}$ proceeds by simulating

  1. $\theta_1^{t+1} \sim \pi(\theta_1|\theta_2^t,\ldots, \theta_p^t)$
  2. ...
  3. $\theta_p^{t+1} \sim \pi(\theta_p|\theta_1^{t+1},\ldots, \theta_{p-1}^{t+1})$

Once again $\pi$ is stationary along this transition.

Two books on the topic by Robert and Casella are Introduction to Monte Carlo with R and Monte Carlo Statistical Methods.

More involved versions of MCMC algorithms are the Metropolis-adjusted Langevin algorithm (MALA) and the Hamiltonian Monte Carlo algorithm (HMC), which is behind the Stan software.

1439 questions
261
votes
11 answers

How would you explain Markov Chain Monte Carlo (MCMC) to a layperson?

Maybe the concept, why it's used, and an example.
Neil McGuigan
  • 9,292
  • 13
  • 54
  • 62
46
votes
1 answer

Variational inference versus MCMC: when to choose one over the other?

I think I get the general idea of both VI and MCMC including the various flavors of MCMC like Gibbs sampling, Metropolis Hastings etc. This paper provides a wonderful exposition of both methods. I have the following questions: If I wish to do…
46
votes
1 answer

What is the difference between Metropolis-Hastings, Gibbs, Importance, and Rejection sampling?

I have been trying to learn MCMC methods and have come across Metropolis-Hastings, Gibbs, Importance, and Rejection sampling. While some of these differences are obvious, i.e., how Gibbs is a special case of Metropolis-Hastings when we have the full…
34
votes
7 answers

Good sources for learning Markov chain Monte Carlo (MCMC)

Any suggestions for a good source to learn MCMC methods?
dram
  • 111
  • 1
  • 3
  • 3
31
votes
4 answers

What is the best method for checking convergence in MCMC?

What is your preferred method of checking for convergence when using Markov chain Monte Carlo for Bayesian inference, and why?
Graham Cookson
  • 7,543
  • 6
  • 41
  • 35
30
votes
1 answer

Computation of the marginal likelihood from MCMC samples

This is a recurring question (see this post, this post and this post), but I have a different spin. Suppose I have a bunch of samples from a generic MCMC sampler. For each sample $\theta$, I know the value of the log likelihood $\log f(\textbf{x} |…
29
votes
3 answers

Examples of errors in MCMC algorithms

I'm investigating a method for automatic checking of Markov chain Monte Carlo methods, and I would like some examples of mistakes that can occur when constructing or implementing such algorithms. Bonus points if the incorrect method was used in a…
Simon Byrne
  • 3,336
  • 15
  • 29
27
votes
2 answers

Why is it necessary to sample from the posterior distribution if we already KNOW the posterior distribution?

My understanding is that when using a Bayesian approach to estimate parameter values: The posterior distribution is the combination of the prior distribution and the likelihood distribution. We simulate this by generating a sample from the…
Dave
  • 1,641
  • 2
  • 14
  • 27
26
votes
1 answer

When would one use Gibbs sampling instead of Metropolis-Hastings?

There are different kinds of MCMC algorithms: Metropolis-Hastings Gibbs Importance/rejection sampling (related). Why would one use Gibbs sampling instead of Metropolis-Hastings? I suspect there are cases when inference is more tractable with…
25
votes
1 answer

Hamiltonian Monte Carlo vs. Sequential Monte Carlo

I am trying to get a feel for the relative merits and drawbacks, as well as different application domains of these two MCMC schemes. When would you use which and why? When might one fail but the other not (e.g. where is HMC applicable but SMC…
24
votes
2 answers

Gibbs sampling versus general MH-MCMC

I have just been doing some reading on Gibbs sampling and Metropolis Hastings algorithm and have a couple of questions. As I understand it, in the case of Gibbs sampling, if we have a large multivariate problem, we sample from the conditional…
Luca
  • 4,410
  • 3
  • 30
  • 52
23
votes
4 answers

C++ libraries for statistical computing

I've got a particular MCMC algorithm which I would like to port to C/C++. Much of the expensive computation is in C already via Cython, but I want to have the whole sampler written in a compiled language so that I can just write wrappers for…
JMS
  • 4,660
  • 1
  • 22
  • 32
23
votes
3 answers

Why should we care about rapid mixing in MCMC chains?

When working with Markov chain Monte Carlo to draw inference, we need a chain that mixes rapidly, i.e. moves throughly the support of the posterior distribution rapidly. But I don't understand why we need this property, because from what I…
qkhhly
  • 487
  • 3
  • 9
22
votes
4 answers

Can Machine Learning or Deep Learning algorithms be utilised to "improve" the sampling process of a MCMC technique?

Based on the little knowledge that I have on MCMC (Markov chain Monte Carlo) methods, I understand that sampling is a crucial part of the aforementioned technique. The most commonly used sampling methods are Hamiltonian and Metropolis. Is there a…
21
votes
2 answers

When did MCMC become commonplace?

Does anyone know around what year MCMC became commonplace (i.e., a popular method for Bayesian inference)? A link to the number of published MCMC (journal) articles over time would be especially helpful.
Ethan Arenson
  • 391
  • 1
  • 7
1
2 3
95 96