I'm trying to do dimensionality reduction with linear discriminant analysis (LDA) in MATLAB. I'm using this code to calculate the coefficients.
But I'm confused whether (and when) should I center the data. So I have 3000-dimensional features and 4000 samples, with class labels from 1 to 5. The features are histogram bins, so they are nicely normalized, i.e. each member is between 0 and 1.
% data is 4000x3000
NTrain = size(data,1);
% center the data : (should I ?)
data_centered = bsxfun(@minus, data, mean(data,1));
% calculate the coefficients and scores :
W=LDA(data_centered,labels); % I also tried W=LDA(data,labels);
newData = [ones(NTrain,1) data_centered] * W';
in the end, I get a W
(and newData
) matrix with very big values, in the order of 1e4
. So I'm probably doing something wrong. Should I center the data beforehand?
Also what about new (test) data? I am saving W
, but should I also save the training mean, so I can subtract it from test data to obtain the scores?
Thanks for any help!