2

Given two classes of training data (A and B), I want to fit each class' distribution using a GMM with k components, and then use Bayes Decision Model for the classification.

The first step was to use PCA on data set A, project the data into a lower-dimensional subspace, and then for each class do the actually fitting.

In MATLAB this would be implemented as:

>> Aobj = gmdistribution.fit(A, k, 'Regularize', 1e-5);
>> Bobj = gmdistribution.fit(B, k, 'Regularize', 1e-5);

However, when I evaluate the pdf for the the training data:

>> pdf(Aobj, A)

one or two of the data points are assigned huge values (3.200989873206918e+241).

What am I missing here?

Data set: collection of 50x25 grey-scale images with A and B indicating the presence or absence of pedestrians.

StasK
  • 29,235
  • 2
  • 80
  • 165
kyrre
  • 151
  • 4
  • Have a look at the variance of each component. My guess is that one component is just covering one point and thus has very low variance (and very high density). – Nick Oct 19 '11 at 00:45

1 Answers1

1

When you get a result like that it means you have a component assigned to a single point. It is using a Gaussian to approximate a dirac-delta function whose center is at the point. Reduce the number of components by the number of "crazy" values and try again.

It is not a bad idea to run the fit a few times (10-30) and then pick out the median fit, or the most "well behaved" fit.

EngrStudent
  • 8,232
  • 2
  • 29
  • 82