Avoiding burn-ins in MC-EM

Question

In Monte Carlo - EM, we use a Monte Carlo sampler in the E-step to approximate the posterior distribution of the latent variables.

The algorithm goes iterates through

1. E-step: $Z_1,...Z_m \sim p(Z | X, \theta)$
1. M-step: $argmax_{\theta} \quad \mathbb{E}_{p(Z | X, \theta)}[\ln p(X,Z) | \theta)] \approx \frac{1}{M}\sum_m \ln p(X,Z^{(m)} | \theta)$

If we use an MCMC method for the E-step, then we need to do some burn-in each time we go through this E-step. That means a lot of burn-in sequences (one for each time we update our parameter $\theta$ in the M-step).

Hence my question: is there some case (or some method) where we can avoid so many burn-ins?

I would be tempted of updating the parameter $\theta$ after each MCMC sample, as if it was another random variable that I maximize instead of sample. But I am aware that updating $\theta$ changes the distribution. However, maybe once the updates of $\theta$ are small enough then I can skip or reduce the number of burn-in samples since the distribution barely changes.

Is there some reference that can shed some light over this?

score 1 · Accepted Answer · answered Jan 23 '18 at 14:47

Say, you have a sample, $(Z^{(m)})_{m=1, \dots, M}$, from $f_{\theta}$. To get a sample from $f_{\theta+\varepsilon}$ you can run an MCMC starting in $Z^{(M)}$. If $\varepsilon$ is small there willl hardly be any burn-in period, because you start the MCMC chain very close to the stationary distribution. However you still need to gather enough samples to get a decent stationary sample. To avoid this sampling, I suggest:

Your idea of updating the parameter after each MCMC sample sounds like adaptive MCMC. One normally uses adaptive MCMC to tune a step size in the proposal. From my limited knowledge about the theory, I don't see why it should not be possible to extend to your situation.
You can use importance sampling on your sampled $Z^{(m)}$'s; So you simulate a driver set $ Z^{(m)}\sim f_{\theta_0}$. To get a sample from $f_{\theta}$, you can use the weighted sample $(Z^{(m)}, \frac{f_{\theta}(Z^{(m)})}{f_{\theta_0}(Z^{(m)})})$. This method could fail when $\theta$ and $\theta_0$ are too far apart, so remember to keep an eye on an estimate of the effective sample size.

Thanks! I'll definetly need to try with Importance Sampling. I've been pointed out to some recent references about IS, and I hope it will do a good job. — alberto, Jan 23 '18 at 16:08

Avoiding burn-ins in MC-EM

1 Answers1

Linked