2

From what I've read, the main advantage of the EM algorithm is that the expectation step can be expressed in closed form giving a deterministic answer and thus 0 variance.

What's the rationale then behind MCEM (Monte Carlo EM) methods [1] which use sampling to calculate the E-step? Specifically, is there a theoretical/empirical evidence that MCEM gives lower variance than just doing sampling on the full likelihood or are there some other advantages of the EM algorithm that come into play here?

[1] http://www.biostat.jhsph.edu/~rpeng/biostat778/papers/wei-tanner-1990.pdf

Edit: To clarify, I mean that if you have the likelihood $\log \sum_z p(y|x)$, $z$ being latent, then one option is to use EM (or MCEM if your approximating distribution cannot give you a closed form). The other way I can see is to estimate the sum directly via sampling. So my question is if you're using sampling anyway, why use MCEM over directly integrating the likelihood.

Edit 2: Replaced MCMC with sampling which is what I had in mind -- got the names confused, sorry.

Opt
  • 737
  • 4
  • 11

1 Answers1

1

To answer you question very literally, MCEM and MCMC are algorithms used to solve different problems. MCEM maximizes a likelihood function (or posterior probability in the case of MAP estimates), while MCMC integrates over a posterior distribution.

When you talk about variance, you are talking about the estimators, not algorithms. So your question should be rephrased as "which has lower variances, posterior means or MLE estimates?". That, of course, is a very vague question.

Cliff AB
  • 17,741
  • 1
  • 39
  • 84
  • Hmm I don't think I follow :- if you have the likelihood log \sum_z p(y|x), z being latent, then one option is to use EM (or MCEM if your approximating distribution cannot give you a closed form). The other way I can see is to estimate the sum directly via MCMC. So my question is if you're using sampling anyway, why use MCEM over MCMC. – Opt Jul 17 '15 at 05:10
  • perhaps sampling is what I meant instead of MCMC – Opt Jul 17 '15 at 05:19
  • I think the best way to phrase this is if you are going to using sampling (required for MCEM), is there an advantage to maximizing (MLE or MAP) rather than integrating (posterior mean)? – Cliff AB Jul 17 '15 at 14:44
  • This question is very broad and hard to answer in general. But I think an answer for why one might prefer the MLE even though sampling is needed for the MCEM algorithm is that you are still integrating over a *subset* of the parameters, and maximizing over the rest. Given that it is often (but not always) faster to maximize than to integrate, you may chose the MCEM algorithm for speed if it is expected that the posterior means will be close the MLE, as is typically the case. – Cliff AB Jul 17 '15 at 16:52