Data normalization prior to PCA?

Question

I want to get some intuition on normalization prior to feature selection with PCA.

I'm sure z-normalization is a bad idea, since it normalizes the variances to 1 for each feature, PCA will be meaningless since it will randomly select the most varying features. Right?

But my data consists of features from very different ranges (some are in the order of 1e-1, whereas some are 1e-8) so I want to combine them in a discriminative classifier.

How can I approach this normalization + feature reduction problem?

Thanks for any help !

`I'm sure z-normalization is a bad idea, since it normalizes the variances to 1 for each feature, PCA will be meaningless since it will randomly select the most varying features. Right?` Absolutely wrong. As it sounds, you might be confused about the very thing what PCA does and what it is for. (Or I have misinterpreted you.) — ttnphns, Sep 08 '15 at 09:53
@ttnphns Afaik, PCA selects the variables (projections on axes) that have the maximum variance. So if I make the variances of all variables 1, then all the projections will more or less have the same variance. Am I wrong? For your second question, my aim is to perform multi-class classification on this data. — jeff, Sep 08 '15 at 10:01
You forget that [variables may correlate](http://stats.stackexchange.com/a/22571/3277). Also PCA does not "select" anything. It is you who selects out from the components or from the variables loaded by them. — ttnphns, Sep 08 '15 at 10:10
So PCA diagonalizes the covariance matrix, i.e. removes the correlation between variables, right? So I select the top x% components (thresholding with cum.sum of `latent` in MATLAB). So IIUYC, equalizing the original variances should not hurt, since the correlation more or less remains the same. Is it right? — jeff, Sep 08 '15 at 11:07

score 1 · Accepted Answer · answered Sep 08 '15 at 10:01

1

First of all, z-normalization is not a bad idea in this case.

Second, There is no magic bullet for normalization, it really depends on the meaning of your features and their distribution.

For example, sound volume is measured in decibels, which has logarithmic scale, and the power consumed by the speakers is measured in watts.

Those two features may indicate the same effect (louder music), but their normalization should be different.

If you have a target variable, I would suggest looking into LDA in this context.

answered Sep 08 '15 at 10:01

Uri Goren

1,701
1
10
24

Thanks ! Actually I'm also trying LDA, CCA etc. for feature selection too. But LDA gives me very few features (number of classes - 1). Is there a way to increase this? As I'm losing thousands of features here. – jeff Sep 08 '15 at 10:02
1

@halilpazarlama. Please not be hurt by my words, but isn't it too much preliminary for you to do LDA, CCA and other complex matters? Are there enough books or at least articles you've read on these topics? I'm saying it because it seems to me that you have some basic misunderstanding even in PCA, and because `LDA gives me very few features`, lesser than you expected (?). – ttnphns Sep 08 '15 at 10:25

Data normalization prior to PCA?

1 Answers1