data normalization after dimension reduction for classification

Question

The classifier is KNN or RBF-SVM. After doing dimension reduction (e.g., PCA, LDA or KPCA, KLDA), does it need to do normalization before classification?

In LIBSVM package, it always needs to first use svm-scale to normalize the features using min-max normalization, then takes the normalized features as inputs for svm-train.

I'm not sure whether the data normalization would harm the structures of the transformed features by PCA, LDA etc.

@JimLohse, thanks, I've revised the post. Please kindly check it. — mining, Jan 31 '16 at 01:04
Please see some general recommendations one nomalization/scaling: e.g. http://stats.stackexchange.com/questions/19216/variables-are-often-adjusted-e-g-standardised-before-making-a-model-when-is and http://stats.stackexchange.com/questions/29781/when-conducting-multiple-regression-when-should-you-center-your-predictor-varia — cbeleites unhappy with SX, Feb 01 '16 at 20:45
@cbeleites, thank you very much for the reference links. They are very helpful experiences. — mining, Feb 02 '16 at 00:31

score 1 · Answer 1 · edited Apr 13 '17 at 12:44

PCA does require normalization as a pre-processing step.

Normalization is important in PCA since it is a variance maximizing exercise. It projects your original data onto directions which maximize the variance. Source: here

Would a further step of data normalization harm the data?

No, it would not harm the data. But would it be really necessary?

import numpy as np
from sklearn.decomposition import PCA

mean = [0.0, 20.0]
cov = [[1.0, 0.7], [0.7, 1000]]
values = np.random.multivariate_normal(mean, cov, 1000)

pca = PCA(n_components=1, whiten=True)
pca.fit(values)

values_ = pca.transform(values)
print np.var(values_)

The following exercise returns 1.0

Why? We are projecting two whitened features onto the first component. Let's assume that a point in the whitened space is identified by a vector ($a$) The new vector ($a'$) is the result of the transformation $$a' = |a| * \cos(\theta) = a \cdot \hat{b} $$

where we have $|a|$ is the length of $a$; and $\theta$ is the angle between the vector $a$ and the vector we are projecting onto. In this case $b$ equals $e$, the eigenvectors, that maps each row vector onto the principal component.

What is the variance of the whitened feature once projected on the principal component?

$$\sigma^2 = \frac{1}{n} \sum^n (a_i \cdot e)^2 = e^T \frac{a^Ta}{n} e$$

$e^Te = 1$ by definition (eigenvectors are unit vectors). Note that when we whitened the data, we imposed that means are zero on the feature set.

Hi, thank you very much for this detail answer. This is very helpful! In fact, I've also posted a question with more details in https://www.quora.com/Does-it-need-feature-normalization-after-dimension-reduction-for-classification, and there are some kind answers and discussions there. If convenient, please kindly check it. Thanks! — mining, Feb 01 '16 at 10:03
Apologies, reading again my own answer I have realized that I made a mistake. Pls check the correction. — IcannotFixThis, Feb 01 '16 at 16:22
Thank you very much for your revision! It seems the conclusion is similar to this link [http://stackoverflow.com/questions/10119913/pca-first-or-normalization-first]. — mining, Feb 02 '16 at 00:28

data normalization after dimension reduction for classification

1 Answers1