0

I'm using PCA from sckit-learn and I'm getting some results which I'm trying to interpret, so I ran into question - should I subtract the mean (or perform standardization) before using PCA, or is this somehow embedded into sklearn implementation?

Moreover, which of the two should I perform, if so, and why is this step needed?


Edit: I've read that I should perform scaling if features have different scale. But one things still puts me in doubt - why scale when PCA considers the directions of greatest variance? Aren't I going to break this when standardizing the data before PCA?

Kobe-Wan Kenobi
  • 2,437
  • 3
  • 20
  • 33
  • Questions about sklearn implementation are off-topic here. Other than that, please look at http://stats.stackexchange.com/questions/53 and http://stats.stackexchange.com/questions/22329. – amoeba Sep 13 '16 at 13:30
  • Re your Edit: even after standardizing the data, some directions can have more variance than the others, and typically this would be the case. That is why PCA on correlations makes sense. Check out the example in the linked thread. – amoeba Sep 14 '16 at 13:20
  • 1
    Thank you for your answer. So, you are saying that even though after standardization variance of each feature will be 1, there will be directions (linear combinations of features) in which variation won't be 1, and directions with greater variation will be selected? – Kobe-Wan Kenobi Sep 14 '16 at 13:56
  • 1
    Yes, @Marko, this is exactly correct. Just think of a simple 2d example. Imagine a scatter-plot of $x$ vs $y$ where both variables are standardized. It can look like a filled circle or like a stretched cigar in the diagonal direction. If it is stretched (i.e. if $x$ and $y$ are correlated), then the diagonal will be the first PC and it will have higher variance than any other direction. – amoeba Sep 14 '16 at 13:59
  • @amoeba Thank you very much for your answer, now I understand better. :) I do have another related question, but I think I'll open another question for it. If you would like you could post your explanation as an answer, I will gladly accept it. – Kobe-Wan Kenobi Sep 14 '16 at 14:15
  • This question is now closed as a duplicate (and I think rightly so) and no answers can be posted here. Go ahead with the new Q if you have any related doubts. – amoeba Sep 14 '16 at 14:17
  • I've opened another question, you can check if you would like http://stats.stackexchange.com/questions/234959/will-i-lose-anomalies-due-to-pca – Kobe-Wan Kenobi Sep 14 '16 at 14:24

0 Answers0