I am trying to wrap my head around the Population Monte Carlo Algorithm. I want to implement it for a mixture model, but I am uncertain on how to proceed. I am mostly looking for references or potentially code.
I have already implemented a naive Gibbs sampler, and it does not seem to converge, i.e. there is an identifiability issue w.r.t. permutations of the clusters.
EDIT: When I say PMC, I am referring to the paper by Cappé from 2004.
I understand the basic Gibbs sampler and my intuition is that in each iteration of the Gibbs sampler, I sample a population of parameters, (instead of a single one). Then I estimate the weights for each parameter in my population and create my new sample by sampling from this discrete population distribution with probabilities proportional to the weights.
The weights are calculated based on my last sample, meaning that I favor samples that are more similar to my last sample. Thus when sampling for a mixture model, I do not expect to get some permutations of clusters in the sampling, because I try to nail down the clusters by favoring similar samples.
Does this make sense?
EDIT2: I am now reading this paper pointed out by Xi'an. I am trying to understand the parameter updates for the Gaussian mixture model. So given some data, I wont to estimate the mixture proportions and parameters. I was looking at the updates in equation (10): $$ \alpha_d^{t+1,N} = \sum_{i=1}^{N}\overline{\omega}_{i,t}\rho_d(X_{i,t};\alpha_d^{t,N},\theta_d^{t,N}) $$ $$ \theta_d^{t+1,N} = \text{arg}\max_{\theta_d}(\sum_{i=1}^{N} \overline{\omega}_{i,t}\rho_d(X_{i,t};\alpha_d^{t,N},\theta_d^{t,N})\log\{ q_d(X_{i,t};\theta_d^{t,N})\}) $$ The updates specific for the Gaussian mixture are shown in equations (11) and (12). What I do not understand, is where my original data comes into this. It seems that I am sampling the data in each iteration, and I do not understand how that helps, or why I should be doing that. Note that I am only interested in estimating the mixture parameters given the data.
In the beginning of section 2.3 the authors describe how to start the sampling procedure by arbitrarily fixing the mixture parameters and then drawing a sample $(X_{i,0})_{1\leq i\leq N}$ associated with the latent variables $(Z_{i,0})_{1\leq i\leq N}$. I do not understand exactly how the latent variable comes in play there, and I do not see any connection to a dataset that I want to estimate the mixture parameters for.