46

I think I get the general idea of both VI and MCMC including the various flavors of MCMC like Gibbs sampling, Metropolis Hastings etc. This paper provides a wonderful exposition of both methods.

I have the following questions:

  • If I wish to do Bayesian inference, why would I choose one method over the other?
  • What are the pros and cons of each of the methods?

I understand that this is a pretty broad question, but any insights would be highly appreciated.

Sean Easter
  • 8,359
  • 2
  • 29
  • 58
kedarps
  • 2,902
  • 2
  • 19
  • 30

1 Answers1

40

For a long answer, see Blei, Kucukelbir and McAuliffe here. This short answer draws heavily therefrom.

  • MCMC is asymptotically exact; VI is not. In the limit, MCMC will exactly approximate the target distribution. VI comes without warranty.
  • MCMC is computationally expensive. In general, VI is faster.

Meaning, when we have computational time to kill and value precision of our estimates, MCMC wins. If we can tolerate sacrificing that for expediency—or we're working with data so large we have to make the tradeoff—VI is a natural choice.

Or, as more eloquently and thoroughly described by the authors mentioned above:

Thus, variational inference is suited to large data sets and scenarios where we want to quickly explore many models; MCMC is suited to smaller data sets and scenarios where we happily pay a heavier computational cost for more precise samples. For example, we might use MCMC in a setting where we spent 20 years collecting a small but expensive data set, where we are confident that our model is appropriate, and where we require precise inferences. We might use variational inference when fitting a probabilistic model of text to one billion text documents and where the inferences will be used to serve search results to a large population of users. In this scenario, we can use distributed computation and stochastic optimization to scale and speed up inference, and we can easily explore many different models of the data.

Sean Easter
  • 8,359
  • 2
  • 29
  • 58
  • 1
    I think Stan is the fastest software to do MCMC (NUTS). What's the fastest (or more powerful) to do Variational Inference? – skan Nov 22 '18 at 20:45
  • 3
    @skan Wonderful question! The closest I've seen to a general-purpose VI software is [edward](https://github.com/blei-lab/edward), though I haven't used it myself. (Many applications of VI are custom, in that they derive an algorithm to fit the specific model of interest.) – Sean Easter Nov 24 '18 at 13:02
  • 2
    Stan also support VI. The only limitation of stan is that it can't sample discrete variables. – RJTK Feb 28 '19 at 19:41
  • 2
    Also, I don't believe Stan runs ADVI on GPU... yet anyway. The fastest software for variational inference is likely TensorFlow Probability (TFP) or Pyro, both built on highly optimized deep learning frameworks (i.e., CUDA). TFP grew out of early work on Edward by Dustin Tran, who now leads TFP at Google I believe. – Adam Erickson Sep 05 '19 at 15:29
  • @AdamErickson FYI: Stan gradually starts using GPUs https://arxiv.org/abs/1907.01063 – Tim Nov 25 '19 at 14:08
  • @Tim Yes, I noticed that as well. – Adam Erickson Nov 26 '19 at 15:04
  • Just happened to stumble across this discussion and I'd like to make a small correction for posterity. I started TFP in 2016 with Ian Langmore and Eugene Brevdo approximately a year before Dustin's involvement. Dustin's main contribution was to help write the paper, https://arxiv.org/abs/1711.10604 and to promote TFP via Edward building on top of it. Both Dustin and I now work on different projects at Google. – jvdillon Dec 10 '21 at 21:55