1

The classifier is KNN or RBF-SVM. After doing dimension reduction (e.g., PCA, LDA or KPCA, KLDA), does it need to do normalization before classification?

In LIBSVM package, it always needs to first use svm-scale to normalize the features using min-max normalization, then takes the normalized features as inputs for svm-train.

I'm not sure whether the data normalization would harm the structures of the transformed features by PCA, LDA etc.

mining
  • 789
  • 5
  • 11
  • 23
  • @JimLohse, thanks, I've revised the post. Please kindly check it. – mining Jan 31 '16 at 01:04
  • 1
    Please see some general recommendations one nomalization/scaling: e.g. http://stats.stackexchange.com/questions/19216/variables-are-often-adjusted-e-g-standardised-before-making-a-model-when-is and http://stats.stackexchange.com/questions/29781/when-conducting-multiple-regression-when-should-you-center-your-predictor-varia – cbeleites unhappy with SX Feb 01 '16 at 20:45
  • @cbeleites, thank you very much for the reference links. They are very helpful experiences. – mining Feb 02 '16 at 00:31

1 Answers1

1

PCA does require normalization as a pre-processing step.

Normalization is important in PCA since it is a variance maximizing exercise. It projects your original data onto directions which maximize the variance. Source: here

Would a further step of data normalization harm the data?

No, it would not harm the data. But would it be really necessary?

import numpy as np
from sklearn.decomposition import PCA

mean = [0.0, 20.0]
cov = [[1.0, 0.7], [0.7, 1000]]
values = np.random.multivariate_normal(mean, cov, 1000)

pca = PCA(n_components=1, whiten=True)
pca.fit(values)

values_ = pca.transform(values)
print np.var(values_)

The following exercise returns 1.0

Why? We are projecting two whitened features onto the first component. Let's assume that a point in the whitened space is identified by a vector ($a$) The new vector ($a'$) is the result of the transformation $$a' = |a| * \cos(\theta) = a \cdot \hat{b} $$

where we have $|a|$ is the length of $a$; and $\theta$ is the angle between the vector $a$ and the vector we are projecting onto. In this case $b$ equals $e$, the eigenvectors, that maps each row vector onto the principal component.

What is the variance of the whitened feature once projected on the principal component?

$$\sigma^2 = \frac{1}{n} \sum^n (a_i \cdot e)^2 = e^T \frac{a^Ta}{n} e$$

$e^Te = 1$ by definition (eigenvectors are unit vectors). Note that when we whitened the data, we imposed that means are zero on the feature set.

IcannotFixThis
  • 1,151
  • 7
  • 20
  • Hi, thank you very much for this detail answer. This is very helpful! In fact, I've also posted a question with more details in https://www.quora.com/Does-it-need-feature-normalization-after-dimension-reduction-for-classification, and there are some kind answers and discussions there. If convenient, please kindly check it. Thanks! – mining Feb 01 '16 at 10:03
  • Apologies, reading again my own answer I have realized that I made a mistake. Pls check the correction. – IcannotFixThis Feb 01 '16 at 16:22
  • Thank you very much for your revision! It seems the conclusion is similar to this link [http://stackoverflow.com/questions/10119913/pca-first-or-normalization-first]. – mining Feb 02 '16 at 00:28