0

I'm trying to do dimensionality reduction with linear discriminant analysis (LDA) in MATLAB. I'm using this code to calculate the coefficients.

But I'm confused whether (and when) should I center the data. So I have 3000-dimensional features and 4000 samples, with class labels from 1 to 5. The features are histogram bins, so they are nicely normalized, i.e. each member is between 0 and 1.

% data is 4000x3000
NTrain = size(data,1);

% center the data : (should I ?) 
data_centered = bsxfun(@minus, data, mean(data,1)); 

% calculate the coefficients and scores :
W=LDA(data_centered,labels); % I also tried W=LDA(data,labels);
newData = [ones(NTrain,1) data_centered] * W';

in the end, I get a W (and newData) matrix with very big values, in the order of 1e4. So I'm probably doing something wrong. Should I center the data beforehand?

Also what about new (test) data? I am saving W, but should I also save the training mean, so I can subtract it from test data to obtain the scores?

Thanks for any help!

ttnphns
  • 51,648
  • 40
  • 253
  • 462
jeff
  • 1,102
  • 3
  • 12
  • 24
  • I cannot help specifically with the Matlab function. But [here](http://stats.stackexchange.com/a/48859/3277) is the algebra of LDA, with a link to the example of computations for iris data. Data in LDA always get centered because in LDA we deal with scatter or covariance matrices. – ttnphns Aug 05 '15 at 10:13
  • @ttnphns Thanks. I further analyzed the external code, and it seems like it calculates the covariances directly (without centering it) , so it looks like I need to center it beforehand. But I'm still confused because the class coefficients are in very different orders. – jeff Aug 05 '15 at 10:22
  • 1
    Covariances _imply_ [centering](http://stats.stackexchange.com/a/22520/3277) the data even if they use the "fast" formula which looks as if bypassing it. So, when computing scores, have the data centered. – ttnphns Aug 05 '15 at 10:26
  • Why do you add `ones` to your `data` before projecting it with `W`? Aren't `W` vectors supposed to be 3000-dimensional, not 3001-dimensional? – amoeba Aug 05 '15 at 13:10
  • @amoeba I'm not sure about that as well, but the external code I'm using (link in the question) has an example projection as a comment, it projects like that. I think the first coefficients correspond to constants. **Edit:** now I think about it, if there are constant terms, maybe I'm **not** supposed to center the data before projecting? – jeff Aug 08 '15 at 04:35

0 Answers0