If number of samples is smaller than number of features, how can all the variance in PCA be explained by less than $n$ components?

Question

I believe I have a problem understanding PCA:

I would like to use this technique to reduce the number of features of my problem. I originally have 10,000 features and 500 samples. However, the use of PCA will limit my number of principal components to the smallest between the number of samples (columns of my data matrix) and the number of features (rows of this matrix). 100% of variance could therefore be explained by 500 components. But 500 components is far smaller than 10,000 features... How can all the variance be explained by less than the number of samples (which has nothing to do with the number of features)?

In PCA of centered $n$ cases X $p$ features data the total number of PCs is $min(n-1,p)$. Imagine you have n=2 data points in p=3 dimensional space. How many reduced dimensionality is needed to explain all the variance?... — ttnphns, May 20 '14 at 08:44

score 5 · Accepted Answer · answered May 20 '14 at 08:44

5

This is because intrinsic dimensionality of the sample is much lower. A set of 500 points will lie in a hyperplane on less than 500 dimensions. To understand this note that a pair of points will lie on a line even in a 3-dimensional space. This line can be treated as a subspace under consideration.

answered May 20 '14 at 08:44

Curious

552
3
15

So, if I have 1,000 components, will my pca be better in some sense than if I have only 500 components? – bigTree May 20 '14 at 08:48
If you have another data set with more than 1000 data points. Here a pca model with 1000 components might contain more information. – Curious May 20 '14 at 08:55

If number of samples is smaller than number of features, how can all the variance in PCA be explained by less than $n$ components?

1 Answers1

Linked