What makes MCMC converge?

Question

Here is what I have learned about MCMC recently

1) We first propose a likelihood function that describes our problem (Binomial)

2) We define a conjugate prior (Beta) and posterior distribution (Beta-Binomial)

3) We define a proposal distribution (Normal) that makes random sampling (Monte Carlo part)

4) We fetch the randomly generated parameter by proposal distribution into the Bayesian function and calculate posterior

5) We calculate ratio of posterior probability of current and previous parameter. We then choose to either accept or ignore this step (Metropolis-Hastings)

6) We iterate this process for thousands of time until posterior distribution converge

This is my rough understanding of MCMC precedure. My main questions are:

1) What exactly are we sampling using proposal distribution? is it the parameter we aim to find through MCMC?

2) Why can MCMC converge eventually? More specifically, How can randomly choosen parameter through proposal distribution fetched into Bayesian formula give us the posterior density probability of that parameter (this is the step that, through iterations, converges the posterior distribution of parameter, right?)

Does this answer your question? [MCMC with Metropolis-Hastings algorithm: Choosing proposal](https://stats.stackexchange.com/questions/100121/mcmc-with-metropolis-hastings-algorithm-choosing-proposal) — Xi'an, Apr 07 '20 at 08:51
Hi Xi'an, Thanks for the advising. I realized that the M-H is randomly sampling through parameter space. Last question, what exact is the reason that make the posterior distribution converge? I know that we are doing Markov process interatively by sampling in parameter space. But how does this make the convergence? I can't get an intuitive understanding on this — unicorn, Apr 07 '20 at 09:14
There is no intuition, only maths: as explained in any textbook or the highly detailed [Wikipedia page on Metropolis-Hastings](https://en.wikipedia.org/wiki/Metropolis%E2%80%93Hastings_algorithm), the target distribution is the _stationary_ distribution associated with the Markov kernel and the ergodic theorem implies the chain converges if the kernel is furthermore irreducible. — Xi'an, Apr 07 '20 at 09:30
Thanks. I will try to read in more details. I am not from math / statistical background. It is a bit hard for me to go into depth in these mathmatical details. But I will try... — unicorn, Apr 07 '20 at 09:36
The best thing to do is to perhaps start with some basic theory on Markov chains. There's some information at the wikipedia page https://en.wikipedia.org/wiki/Markov_chain but you'll probably want a book that gives more depth / detail — Glen_b, Apr 08 '20 at 05:16

Xi'an · Answer 1 · 2020-04-29T16:10:22.993

Most statements in the question are somewhat incorrect:

first propose a likelihood function that describes our problem (Binomial)

The sampling model (or the likelihood) definition is not a part of the MCMC method, it is a given. For instance, $$L(p|x)={n \choose x} p^x (1-p)^{n-x}$$ assumes that the data $x$ is Binomial

define a conjugate prior (Beta) and posterior distribution (Beta-Binomial)

Similarly, the prior distribution on the parameter is a given from the MCMC perspective, not something that can be calibrated. Furthermore, if the prior is conjugate, then MCMC is usually not necessary. When $p\sim \mathcal{Be}(a,b)$, the posterior is also a Beta distribution, not a Beta-Binomial distribution (which is the marginal distribution of $x$):$$p|x\sim \mathcal{Be}(a+x,b+n-x)$$which can be used either analytically or numerically to compute posterior quantities. Exact and direct simulation from this posterior is manageable, hence does not require MCMC except in a toy experiment.

define a proposal distribution (Normal) that makes random sampling (Monte Carlo part)

When the posterior distribution is defined on $(0,1)$ it is not the best possible choice to use a Normal distribution that takes values all over $\mathbb R$, even though this is not formally incorrect.

choose to either accept or ignore this step (Metropolis-Hastings)

The term ignore is potentially harmful in that the proposed value is rejected but the step is not ignored: the previous value is reproduced another time in the chain.

iterate this process for thousands of time until posterior distribution converge

Actually the convergence is to the posterior distribution, which does not depend on the MCMC algorithm or on the simulation step, and not of the posterior distribution. The Markov chain converges in distribution to the posterior distribution. A finite sample of values of this Markov chain thus behaves in the limit as a sample from the posterior distribution.

How can randomly chosen parameter through proposal distribution fetched into Bayesian formula give the posterior density probability

A Markov chain either converges to a limiting distribution (positive recurrence) or not (null recurrence, transience). If the algorithm is stationary with respect the posterior distribution, then the Markov chain does converge to this distribution and no other. It thus suffices to establish stationarity. The acceptance step in the Metropolis-Hastings algorithm is constructed precisely for the posterior distribution to be stationary (detailed balance identity).

What makes MCMC converge?

1 Answers1