Gibbs sampling convergence

Question

In an astronomical context, the authors of a paper desire to use a Gibbs algorithm. Please note: I am inexperience in MCMC algorithms, and specifically in Gibbs sampling.

What we want, in essence, is the full posterior distribution, given some data: $P(X,Y|data)$. To achieve this, we sample from the marginal distributions $P(X|Y,data)$ and $P(Y|X,data)$. In the stationary situation, sampling from these would be statistically identical to sampling from the full joint probability distribution, as I understand it.

Now, the authors note that in this particular case,

" (...) unfortunately the two marginal distributions $P(X|Y,data)$ and $P(Y|X,data)$ are in general both much narrower than the marginalized distribution $P(X|data)$: given a particular a particular $Y$ [in this case, lensing potential], $X$ [in this case, the delensed CMB] is given essentially by a delta function. This means that naive Gibbs iterations will not converge within a reasonable time."

Now these statements are completely lost on me. So my question is twofold:

The statement about the delta function is meant as explanatory supplement I think, but it doesn't clarify anything for me. Why is this 'essentially a delta function'?
Second, and more importantly: OK, suppose the conditionals are 'much narrower'. So what? Why is $P(X|data)$ even relevant, aren't we interested in $P(X|Y,data)$, $P(Y|X,data)$ and ultimately $P(X,Y|data)$ only in Gibbs sampling? Why would narrow conditions mean slow convergence?

The paper in question is a review paper by Challinor and Lewis, 2006. The arxiv print can be found here:

http://arxiv.org/pdf/astro-ph/0601594v4.pdf

And the text I'm referring to is at the end of section 8, delensing the sky.

I added the reference at the end of my original post: it is the review paper by Challinor & Lewis, http://arxiv.org/pdf/astro-ph/0601594v4.pdf. — user1991, Sep 07 '15 at 14:32

Xi'an · Accepted Answer · 2015-09-07T15:40:10.307

Almost by definition, the conditionals $P(X|Y,data)$ and $P(Y|X,data)$ have thinner tails (or are "narrower") than the marginals$$P(X|data)=\int P(X|y,data)\text{d}y\text{ and }P(Y|data=\int P(Y|x,data)\text{d}x$$Hence, it is not surprising that the Gibbs sampler is slow in exploring the tails of the marginals.

For instance, if $$(X,Y)|data\sim\mathcal{N}_2((0,0),\Sigma)$$with $$\sigma=\left[\begin{matrix}1 &\varrho\cr\varrho &1\cr\end{matrix}\right]$$we have $$X|Y,data\sim\mathcal{N}(-\varrho Y,1-\varrho^2)\ \text{ while }\ X|data\sim\mathcal{N}(0,1)$$Hence a thicker tail for the marginals, especially if $\varrho$ is close to one.

Here is an illustration where the chain $(X_t)$ is generated when $\varrho=0.995$: the values are highly correlated and take many iterations to move from one end of the support to the other end. And, worse, to compensate for the rare visits to the tails, the chain has difficulties leaving a tail when it meets one, as shown on the picture for the rare interval between -4 and -2, around iteration 2000.

rho=.995
T=1e4
x=rep(0,T)
for (t in 2:T){
y=rnorm(1,rho*x[t-1],sd=sqrt(1-rho*rho))
x[t]=rnorm(1,rho*y,sd=sqrt(1-rho*rho))}

Thank you! It is clear to me then why the conditionals are more narrow. However, why would this imply slower convergence for the Gibbs sampling? I don't quite get what you mean by 'exploring the tails of the marginals' - perhaps due to my limited understanding of Gibbs sampling itself? — user1991, Sep 07 '15 at 14:50

Gibbs sampling convergence

1 Answers1

Linked

Related