I believe I have a problem understanding PCA:
I would like to use this technique to reduce the number of features of my problem. I originally have 10,000 features and 500 samples. However, the use of PCA will limit my number of principal components to the smallest between the number of samples (columns of my data matrix) and the number of features (rows of this matrix). 100% of variance could therefore be explained by 500 components. But 500 components is far smaller than 10,000 features... How can all the variance be explained by less than the number of samples (which has nothing to do with the number of features)?