Gibbs sampler on the precision (with a gamma prior) in a hierarchical Bayesian model doesn't converge

Question

I am deriving a Gibbs sampler with a model similar to the model in this paper (a graphical model is shown in page 4). To put it simple, my question only concerns $w_i$ (a $K$-dimensional vector drawn from a normal distribution) and its precision $\gamma_w$, thus I will just make these 2 variables unknown and treat the rest as known (hopefully without losing the mathematical rigors):

$w_i \sim \mathcal{N}(0, \sigma_w^2I_K)$

$\gamma_w = \frac{1}{\sigma_w^2} \sim \Gamma(\alpha_0, \beta_0)$, here $\alpha_0, \beta_0$ are hyperparameters.

And the conditional probability is:

$p(w_{ik}|-) \sim \mathcal{N}(\mu_{w_{ik}}, \sigma_{w_{ik}}^2)$, where $\sigma_{w_{ik}}^2 = (\gamma_w + T)^{-1}$. $T$ is actually a pretty complicated term but it's nonnegative.

$p(\gamma_w|-) = \Gamma(\alpha_0 + \frac{1}{2}KN, \beta_0 + \frac{1}{2}\sum_{i=1}^N w_i^Tw_i)$

I code it up and generate some simulated data according to the graphical model. I set all the variables other than $w_i$ and $\gamma_w$ to their true value and keep unchanged. The hyperparameters are set to 0.001 which basically forces the prior of $\gamma_w$ to be 0 (I am actually not certain if this is a good choice). The sampler runs with $\gamma_w$ goes to infinity. The log-likelihood of samples goes to much larger than the log-likelihood of the true data.

From the conditional distribution, this non-convergence is actually kind of "observable", as each $w_{ik}$ is drawn with a larger precision than the sample from $p(\gamma_w|-)$ (because of the nonnegative term). Then when sampling $\gamma_w$, we are using a smaller $\sum_{i=1}^N w_i^Tw_i$. Thus the samples of $\gamma_w$ get larger.

This seems wrong, but I cannot figure out where the problem is. Any comments will be highly appreciated. Thanks!

Updated: I think this may be generally true for a hierarchical normal model with unknown variance, then I found this paper which seems to partially agree with my observation. But this paper is a little beyond my crappy math background...

If you use hyperparameters essentially equal to zero in the gamma prior, you have an improper prior. This is definitely not "a good choice" as your posterior is most likely improper as well (I did not check). In which case the Gibbs sampler cannot converge... This is a well-known issue with the Gibbs sampler, see e.g. Section 10.4.3 of our Robert-Casella (2004) book... — Xi'an, Nov 14 '12 at 10:01
A good intro to this issue is called "Gibbs for kids" by Casella and George. Ref. Casella, G., George, E.I. (1992). Explaining the Gibbs Sampler. The American Statistician, 46, 167–. 174 — Xi'an, Nov 14 '12 at 10:07
Note: In the [paper](http://probability.ca/jeff/ftpdir/james.pdf) by Jeff Rosenthal you mention the impropriety of the posterior stems from using a flat prior on the mean hyperparameter rather than the variance hhyperparameter. — Xi'an, Nov 14 '12 at 10:57
@Xi'an Thanks for pointing me the paper and book chapter! (though Amazon doesn't want me to read Section 10.4.3 as preview...) — Dawen Liang, Nov 15 '12 at 06:08

Gibbs sampler on the precision (with a gamma prior) in a hierarchical Bayesian model doesn't converge

0 Answers0