5

Recently I've been interested in applying PCA to a dataset I have and I wanted to develop a deep understanding of what I would actually be doing when I implement it.

Today I encountered two confronting answers to the question of what is the maximum number of principal components. The two answers are these ones:

Do any of you know what is the meaning of that extra component that sklearn's PCA is offering?

amoeba
  • 93,463
  • 28
  • 275
  • 317
  • 1
    If the number of samples $n$ is less than or equal to the number of features, the $n$-th PC will be constant zero (eigenvalue = 0). This is what `sklearn` will presumably return. The number of non-trivial PCs is $n-1$ as per the linked answer. – amoeba Mar 16 '18 at 16:27
  • Are you doing PCA with or without centering? – whuber Mar 16 '18 at 16:31
  • 1
    @whuber I don't think `sklearn.decomposition.PCA` can do PCA without centering. I don't see such an option [in the documentation](http://scikit-learn.org/stable/modules/generated/sklearn.decomposition.PCA.html#sklearn.decomposition.PCA). – amoeba Mar 16 '18 at 16:41
  • 1
    @amoeba Completely on point! I just checked what is the n-th principal component of my data and it is always 0! Problem solved :) – Marcos Galletero Romero Mar 16 '18 at 17:37
  • @whuber I'm currently NOT centering the data, but your comment led me to do some research and I'm definitely going to repeat the analysis but this time centering the data. Reasons can be found [here](https://www.quora.com/Why-is-it-beneficial-to-center-and-normalize-the-data-before-running-Principal-Component-Analysis-on-it) and [here](https://stats.stackexchange.com/questions/22329/how-does-centering-the-data-get-rid-of-the-intercept-in-regression-and-pca). – Marcos Galletero Romero Mar 16 '18 at 17:55
  • 1
    If you use scikit's PCA then it does centering for you. – amoeba Mar 16 '18 at 22:12

1 Answers1

2

Per @amoeba's comments:

If the number of samples $n$ is less than or equal to the number of features, the $n$-th PC will be constant zero (eigenvalue = 0). This is what sklearn will presumably return. The number of non-trivial PCs is $n−1$ as per the linked answer.

Firebug
  • 15,262
  • 5
  • 60
  • 127