PyMC for Categorical Latent Model

Question

I'm learning PyMC and am trying to fit a simple categorical mixture model but the sampling estimates don't converge to the true values. I'm wondering if I've specified the model incorrectly or am using the sampler incorrectly. The true means are $\mu$ = [-10,0,10,20]. The code specifying the model is below. Any insights much appreciated!

Updated info:

I also coded this problem in Stan and encountered the same issue. This issue appears to be an "identifiability","aliasing", or "label switching" issue. For more detail, see section 19.2 of the current Stan manual. I'm not yet sure how to fix the problem.

import matplotlib.pyplot as plt
import numpy as np
import pymc as mc
from pymc.Matplot import plot

# generate simulated data
ndata = 1000
numCat = 4
c = np.random.randint(0,numCat,ndata)
mu = [-10,0,10,20]
sigma = .25
sample = np.zeros(ndata)
for i in range(ndata):
    sample[i] = np.random.normal(mu[c[i]],sigma,1)

# define the model in PyMC
labels = mc.Categorical('labels', p = np.array([.25,.25,.25,.25]),size = ndata)  
means = mc.Uniform('means', lower=-30., upper=30., size=numCat)

@mc.deterministic
def mean(labels=labels, means=means):
    return means[labels]
obs = mc.Normal('obs', mean, 1/(sigma**2), value=sample, observed = True)
model = mc.Model({'labels': labels,'means': means, 'obs': obs})

# fit the model
mcmc = mc.MCMC( model )
mcmc.sample( 50000,0 )
plot(mcmc)

score 2 · Answer 1 · edited Apr 13 '17 at 12:44

It has been 9 months since you asked this question, so I suspect that you might not still be searching for a simple solution to the "label-switching" problem (or might have yourself discovered one). However, for the sake of others who might stumble upon this thread, here are my 2 cents.

One easy hack that can potentially get you around the label-switching problem is to enforce identifiability constraints, similar to the one mentioned here (along with a lucid explanation). In your case, this amounts to enforcing

$mean_0 < mean_1 < mean_2 < mean_3$

This can be done by post-processing the four traces for the means stochastic so that they conform to the above mentioned constraint. One should note that this is not guaranteed to solve the problem, as evident from the example shown in this paper (section 2.2 and 2.3). It goes on to propose a more complex, but effective post-processing method. This thread has quite a few references to papers/discussions aimed at this problem.

As an aside, the autocorrelation plot seems to strongly suggest that the MCMC chain has not converged (or maybe it has; low autocorrelation is a sufficient condition for convergence, but not necessary). Wrong convergence might also be an artefact of the way you have declared the stochastics as blocked arrays. Quoting from this answer

The problem is caused by the way that PyMC draws samples for this model. As explained in section 5.8.1 of the PyMC documentation, all elements of an array variable are updated together. For small arrays ... this is not a problem, but for a large array ... it leads to a low acceptance rate.

I suggest you to change

labels = mc.Categorical('labels', p = np.array([.25,.25,.25,.25]),size = ndata)

to

labels = mc.Container([mc.Categorical('label_%d' % i, p = np.array([.25,.25,.25,.25]) for i in xrange(0, ndata)])

and read the answer referenced above for few more suggestions.

PyMC for Categorical Latent Model

1 Answers1