0

In Monte Carlo - EM, we use a Monte Carlo sampler in the E-step to approximate the posterior distribution of the latent variables.

The algorithm goes iterates through

    1. E-step: $Z_1,...Z_m \sim p(Z | X, \theta)$
    1. M-step: $argmax_{\theta} \quad \mathbb{E}_{p(Z | X, \theta)}[\ln p(X,Z) | \theta)] \approx \frac{1}{M}\sum_m \ln p(X,Z^{(m)} | \theta)$

If we use an MCMC method for the E-step, then we need to do some burn-in each time we go through this E-step. That means a lot of burn-in sequences (one for each time we update our parameter $\theta$ in the M-step).

Hence my question: is there some case (or some method) where we can avoid so many burn-ins?

I would be tempted of updating the parameter $\theta$ after each MCMC sample, as if it was another random variable that I maximize instead of sample. But I am aware that updating $\theta$ changes the distribution. However, maybe once the updates of $\theta$ are small enough then I can skip or reduce the number of burn-in samples since the distribution barely changes.

Is there some reference that can shed some light over this?

alberto
  • 2,646
  • 16
  • 36

1 Answers1

1

Say, you have a sample, $(Z^{(m)})_{m=1, \dots, M}$, from $f_{\theta}$. To get a sample from $f_{\theta+\varepsilon}$ you can run an MCMC starting in $Z^{(M)}$. If $\varepsilon$ is small there willl hardly be any burn-in period, because you start the MCMC chain very close to the stationary distribution. However you still need to gather enough samples to get a decent stationary sample. To avoid this sampling, I suggest:

  1. Your idea of updating the parameter after each MCMC sample sounds like adaptive MCMC. One normally uses adaptive MCMC to tune a step size in the proposal. From my limited knowledge about the theory, I don't see why it should not be possible to extend to your situation.
  2. You can use importance sampling on your sampled $Z^{(m)}$'s; So you simulate a driver set $ Z^{(m)}\sim f_{\theta_0}$. To get a sample from $f_{\theta}$, you can use the weighted sample $(Z^{(m)}, \frac{f_{\theta}(Z^{(m)})}{f_{\theta_0}(Z^{(m)})})$. This method could fail when $\theta$ and $\theta_0$ are too far apart, so remember to keep an eye on an estimate of the effective sample size.
svendvn
  • 668
  • 3
  • 13
  • Thanks! I'll definetly need to try with Importance Sampling. I've been pointed out to some recent references about IS, and I hope it will do a good job. – alberto Jan 23 '18 at 16:08