0

I am doing classification of this UCI Dataset in Matlab. I represented dataset as matrix (instances x dimensions) and 2nd matrix as (instances x label [instances x 1]).

With Naive Bayess I get accuracy of multiclass classification 0.65. But when I use dataset transformed with PCA I get accuracy only 0.15 even if I use all dimensions. I guess I am doing something wrong. This is my matlab code:

%x_tr, y_tr =training set, labels of training set
%x_tst,y_tst=testing set , labels of testing set
model = fitNaiveBayes(x_tr,y_tr);
Y=predict(model,x_tst);
acc=accuracyMC(Y); %0.65

%PCA usage
[COEFF,SCORE] = princomp(x_tr);
model_pca = fitNaiveBayes(SCORE,y_tr);
[COEFF,SCORE] = princomp(x_tst);
Y=predict(model_pca,SCORE);
acc_pca=accuracyMC(Y); %0.15

I also tried normalize it with z-score.

Sycorax
  • 76,417
  • 20
  • 189
  • 313
SpeedEX505
  • 163
  • 6
  • 2
    possible duplicate of [What can cause PCA to worsen results of a classifier?](http://stats.stackexchange.com/questions/52773/what-can-cause-pca-to-worsen-results-of-a-classifier) – Sycorax Jun 10 '15 at 13:42
  • 3
    This thread illustrates a potential pitfall - transforming the sets separately - that the other thread doesn't. So I'm not convinced it is an exact duplicate although the link is certainly useful and relevant to the OP. – Silverfish Jun 10 '15 at 14:42

1 Answers1

2

First, you are transforming your training set and test set independently.

What you want to do instead is perform PCA on your training set, obtain the coefficients, train on the transformed data, and the transform your test data using the training PCA coefficients before prediction.

See this MATLAB thread for instructions.

Secondly, and perhaps more importantly, given the multiple features families (textual, visual, and auditory) you have in your dataset, I'm not sure that the PCA transform is a valid choice.

Bar
  • 2,492
  • 3
  • 19
  • 31
  • so It should be Y=predict(model,COEFF*x_tst), where COEFF is matrix from counting pca of x_tr? – SpeedEX505 Jun 10 '15 at 14:00
  • I'm am not very familiar with MATLAB but that should be correct. The linked thread should provide the correct way to do this. – Bar Jun 10 '15 at 14:34