Population Monte Carlo Algorithm

Question

I am trying to wrap my head around the Population Monte Carlo Algorithm. I want to implement it for a mixture model, but I am uncertain on how to proceed. I am mostly looking for references or potentially code.

I have already implemented a naive Gibbs sampler, and it does not seem to converge, i.e. there is an identifiability issue w.r.t. permutations of the clusters.

EDIT: When I say PMC, I am referring to the paper by Cappé from 2004.

I understand the basic Gibbs sampler and my intuition is that in each iteration of the Gibbs sampler, I sample a population of parameters, (instead of a single one). Then I estimate the weights for each parameter in my population and create my new sample by sampling from this discrete population distribution with probabilities proportional to the weights.

The weights are calculated based on my last sample, meaning that I favor samples that are more similar to my last sample. Thus when sampling for a mixture model, I do not expect to get some permutations of clusters in the sampling, because I try to nail down the clusters by favoring similar samples.

Does this make sense?

EDIT2: I am now reading this paper pointed out by Xi'an. I am trying to understand the parameter updates for the Gaussian mixture model. So given some data, I wont to estimate the mixture proportions and parameters. I was looking at the updates in equation (10): $$ \alpha_d^{t+1,N} = \sum_{i=1}^{N}\overline{\omega}_{i,t}\rho_d(X_{i,t};\alpha_d^{t,N},\theta_d^{t,N}) $$ $$ \theta_d^{t+1,N} = \text{arg}\max_{\theta_d}(\sum_{i=1}^{N} \overline{\omega}_{i,t}\rho_d(X_{i,t};\alpha_d^{t,N},\theta_d^{t,N})\log\{ q_d(X_{i,t};\theta_d^{t,N})\}) $$ The updates specific for the Gaussian mixture are shown in equations (11) and (12). What I do not understand, is where my original data comes into this. It seems that I am sampling the data in each iteration, and I do not understand how that helps, or why I should be doing that. Note that I am only interested in estimating the mixture parameters given the data.

In the beginning of section 2.3 the authors describe how to start the sampling procedure by arbitrarily fixing the mixture parameters and then drawing a sample $(X_{i,0})_{1\leq i\leq N}$ associated with the latent variables $(Z_{i,0})_{1\leq i\leq N}$. I do not understand exactly how the latent variable comes in play there, and I do not see any connection to a dataset that I want to estimate the mixture parameters for.

Could you define more precisely what you mean by Population Monte Carlo? I have written a few papers on one population Monte Carlo algorithm but your question is not precise enough to understand whether or not this is the same algorithm as mine. — Xi'an, Apr 23 '16 at 12:09
@Xi'an Thanks for the response, I will edit my answer to make it more clear. — Gumeo, Apr 23 '16 at 12:17
OK, so this is our paper and our algorithm! Further precisions are needed wrt mixtures: are you interested in a mixture distribution or in estimating the parameters of a mixture distribution? and wrt Gibbs: population Monte Carlo is not based on a Gibbs principle but rather on an importance sampling concept where the previous sample is used to build a better proposal. I suggest you check the paper [Cappé et al. 2008](http://arxiv.org/abs/0710.4242v1) instead of the 2004 paper. — Xi'an, Apr 23 '16 at 12:31
Ok, great thanks! I want to estimate the parameters of a mixture distribution, more specifically Bernoulli mixture. I think I will start by reading the paper you suggest and then I will come back to this question or post a new one. — Gumeo, Apr 23 '16 at 12:37
@Xi'an I am very new to IS, I am trying to understand the updates for the Gaussian mixture model equations 11 and 12, this is probably a very naive question. Why are there two indicies for $X$, i.e $X_{i,t}$. Is the data resampled between iterations? Why is $X$ dependent on $t$? — Gumeo, Apr 23 '16 at 13:21
Population Monte Carlo uses a sequence of importance samplers, indexed by $t=1,...,T$. At each iteration $t$, a population or sample of $I$ points $X_{i,t}$ for $i=1,...,I$ is generated from the current proposal. This is explained in Section 2.3 of the 2008 paper. Warning: the importance sampling distribution is a mixture of normal distributions, not to be confused with estimating the parameters of a mixture of normal distributions over the observations. The former is in the parameter space, while the latter is in the observation space. — Xi'an, Apr 23 '16 at 13:28
@Xi'an I have further edited the question! Thanks for your help so far! — Gumeo, Apr 23 '16 at 14:01

Population Monte Carlo Algorithm

0 Answers0

Linked