0

I have a question regarding the two MCMC algorithms, Gibbs sampling and Hamiltonian Monte Carlo (HMC) for performing the Bayesian analysis.

If using Gibbs sampling, my understanding is that we need to derive the exact formulas for different conditional distributions corresponding to interested latent variables. The Gibbs sampling scheme has to rely on these formulas. If we are handling complex models in practice, deriving the exact formula might be infeasible.

If using HMC, we do not need to derive these exact formulas and rely on gradient information, (like probabilistic programming) to fit the model. This makes modeling a complex Bayesian model feasible in practice. Is this understanding correct?

user3269
  • 4,622
  • 8
  • 43
  • 53
  • There are more than two MCMC algorithms. – Xi'an Sep 23 '21 at 12:10
  • 1
    If anything, HMC constrains you a little but in my opinion, it's worth it. Gradient info cannot be obtained from discrete priors. For example, the Bernoulli PMF doesn't have a gradient and so your choices are limited. In practice, you'd just use a Beta PDF instead. https://proceedings.neurips.cc/paper/2020/file/c6a01432c8138d46ba39957a8250e027-Paper.pdf – jbuddy_13 Sep 23 '21 at 14:46

1 Answers1

1

It is incorrect to state that a Gibbs sampler requires the exact densities of the full conditionals. A Gibbs sampling algorithm requires a collection of conditional distributions that

  1. correspond to a partition of the vector to be simulated (or a completion of said vector by auxiliary variables) into blocks
  2. and such that the conditional distributions of these blocks are generative models, ie are associated with simulation algorithms of a moderate enough complexity.

In the event the simulation from the conditionals is not readily available, it can be replaced with a Metropolis-within-Gibbs version, which requires further that the joint density associated with these conditionals is available up to a normalising constant. The pseudo-marginal generalisation demonstrates that using an unbiased estimate of the density is also valid.

An HMC algorithm similarly requires the (joint) target density to be available up to a multiplicative constant so that the gradient be computed (or again estimated in an unbiased manner)]. There is therefore no complexity gain in using HMC.

Xi'an
  • 90,397
  • 9
  • 157
  • 575
  • thanks for your reply. While using modern probabilistic programming language, like Stan and PyMC3, it seems that we only need to setup prior and model specifications (likehood function), and it is not necessary to derive/write down the target posterior density, which is required by classical Bayesian analysis using Gibbs/MH. And it seems that they all use HMC to fit the distribution. which confuses me on how they achieve it. – user3269 Sep 23 '21 at 20:14
  • If you input a prior $\pi(\theta)$ and a likelihood function $\ell(\theta|x)$, you equivalently input the posterior:$$\pi(\theta|x)\propto\pi(\theta)\ell(\theta|x)$$Check this [earlier question](https://stats.stackexchange.com/q/307882/7224) – Xi'an Sep 24 '21 at 05:24