1

I performed (sklearn) PCA on a (1416960,140) pandas DataFrame.

The resulting components_ attribute is a matrix where each principal component is associated to an array with the directions of maximum variance for each feature.

In order to get which feature is more "correlated" to each component, I just get which feature has the higher (absolute) variance for each component (as shown also in: https://stackoverflow.com/questions/50796024/feature-variable-importance-after-a-pca-analysis)

My problem is that for multiple components the highest variance is given by the same feature, indeed i get different components with the same most important feature. Can I avoid this behavior? What is it caused by? What can I do in order to mitigate this issue?

I couldn't find any thread on this topic.

fred
  • 11
  • 1
  • Welcome to this website, fred. If I got it right, one of your original variables (features) contributes the most to many PCs. This is evidence that this variable is really important for the total variance and, thus, should have high correlation with PC1. – ouranos May 24 '20 at 21:11
  • Another thing, did you do the PCA on the covariance or correlation matrix? The latter could be appropriate in your case – ouranos May 24 '20 at 21:13
  • Hello @ouranos, thanks for replying. Exactly you got it right, some original variables contributes the most to many PCs. And yes, I did the PCA on the correlation matrix, in fact I standardized each of the variables. Unfortunately I am quite new to PCA and I don't really get what does this "multiple contribution" mean. What is the meaning of multiple variables contributing to multiple PCs? And how does it reflect on my data? Thank you very much. – fred May 25 '20 at 11:13
  • Each PC is a linear combination of your variables. The contribution of each variable to each PC is reflected in these linear coefficients, which should all be comparable in magnitude if you standardized your data and performed PCA on the correlation matrix. I would refer you to [this already classic question](https://stats.stackexchange.com/questions/2691/making-sense-of-principal-component-analysis-eigenvectors-eigenvalues) and especially answers therewith for general intuition on PCA. – ouranos May 26 '20 at 13:58

0 Answers0