LDA scores too big

Question

I'm trying to do dimensionality reduction with linear discriminant analysis (LDA) in MATLAB. I'm using this code to calculate the coefficients.

But I'm confused whether (and when) should I center the data. So I have 3000-dimensional features and 4000 samples, with class labels from 1 to 5. The features are histogram bins, so they are nicely normalized, i.e. each member is between 0 and 1.

% data is 4000x3000
NTrain = size(data,1);

% center the data : (should I ?) 
data_centered = bsxfun(@minus, data, mean(data,1)); 

% calculate the coefficients and scores :
W=LDA(data_centered,labels); % I also tried W=LDA(data,labels);
newData = [ones(NTrain,1) data_centered] * W';

in the end, I get a W (and newData) matrix with very big values, in the order of 1e4. So I'm probably doing something wrong. Should I center the data beforehand?

Also what about new (test) data? I am saving W, but should I also save the training mean, so I can subtract it from test data to obtain the scores?

Thanks for any help!

I cannot help specifically with the Matlab function. But [here](http://stats.stackexchange.com/a/48859/3277) is the algebra of LDA, with a link to the example of computations for iris data. Data in LDA always get centered because in LDA we deal with scatter or covariance matrices. — ttnphns, Aug 05 '15 at 10:13
@ttnphns Thanks. I further analyzed the external code, and it seems like it calculates the covariances directly (without centering it) , so it looks like I need to center it beforehand. But I'm still confused because the class coefficients are in very different orders. — jeff, Aug 05 '15 at 10:22
Covariances _imply_ [centering](http://stats.stackexchange.com/a/22520/3277) the data even if they use the "fast" formula which looks as if bypassing it. So, when computing scores, have the data centered. — ttnphns, Aug 05 '15 at 10:26
Why do you add `ones` to your `data` before projecting it with `W`? Aren't `W` vectors supposed to be 3000-dimensional, not 3001-dimensional? — amoeba, Aug 05 '15 at 13:10
@amoeba I'm not sure about that as well, but the external code I'm using (link in the question) has an example projection as a comment, it projects like that. I think the first coefficients correspond to constants. **Edit:** now I think about it, if there are constant terms, maybe I'm **not** supposed to center the data before projecting? — jeff, Aug 08 '15 at 04:35

LDA scores too big

0 Answers0