Interpreting PCA results when a lot of PCs are needed to reach 95% of variance

Question

I'm working on a simple classification project to better understand PCA, but I don't understand the results.

My dataset has 10 features, and I'm trying to predict the label (SeriousDlqin2yrs).

> head(dfTraining)
  SeriousDlqin2yrs RevolvingUtilizationOfUnsecuredLines age
1                1                            0.7661266  45
2                0                            0.9571510  40
3                0                            0.6581801  38
4                0                            0.2338098  30
5                0                            0.9072394  49
6                0                            0.2131787  74
  NumberOfTime30.59DaysPastDueNotWorse  DebtRatio MonthlyIncome
1                                    2 0.80298213          9120
2                                    0 0.12187620          2600
3                                    1 0.08511338          3042
4                                    0 0.03604968          3300
5                                    1 0.02492570         63588
6                                    0 0.37560697          3500
  NumberOfOpenCreditLinesAndLoans NumberOfTimes90DaysLate NumberRealEstateLoansOrLines
1                              13                       0                            6
2                               4                       0                            0
3                               2                       1                            0
4                               5                       0                            0
5                               7                       0                            1
6                               3                       0                            1
  NumberOfTime60.89DaysPastDueNotWorse NumberOfDependents
1                                    0                  2
2                                    0                  1
3                                    0                  0
4                                    0                  0
5                                    0                  0
6                                    0                  1

I then ran PCA to produce 10 principal components, but to reach 95% of the variance, it takes 8 principal components. How should I interpret this result?

> dfTraining.pca <- prcomp(dfTraining[,2:11], center=T, scale=T, na.action=na.omit)
> summary(dfTraining.pca)
Importance of components:
                          PC1    PC2    PC3    PC4     PC5     PC6     PC7     PC8     PC9    PC10
Standard deviation     1.7254 1.2376 1.1053 1.0086 0.99993 0.96430 0.85622 0.74380 0.16184 0.10096
Proportion of Variance 0.2977 0.1532 0.1222 0.1017 0.09999 0.09299 0.07331 0.05532 0.00262 0.00102
Cumulative Proportion  0.2977 0.4509 0.5730 0.6747 0.77474 0.86773 0.94104 0.99636 0.99898 1.00000

I then visualized the result with ggbiplot and ggplot to make the following chart. If someone could help interpret it for me, I would appreciate it.

Peter gave the obvious answer.. also this may be helpful for you to read http://stats.stackexchange.com/questions/2691/making-sense-of-principal-component-analysis-eigenvectors-eigenvalues — Slow loris, May 24 '16 at 05:36
You might find it even more instructive to start with data whose structure you know. One way to produce results remarkably like these is to generate normal variates with a little bit of known correlation and run the procedure on them. You might be surprised to see how much the eigenvalues vary. — whuber, May 25 '16 at 19:09
@Slowloris: I did read through that question and the answers. It explains how PCA works. but it doesn't explain how I can diagnosis my problem. — stackoverflowuser2010, Jun 04 '16 at 06:00

score 6 · Answer 1 · answered May 23 '16 at 22:34

6

It says that your data is not very compressible.

The goal of PCA is to find a few linear combinations of variables that account for most of the variation. In your case, no such variables exist, at least, they are not findable by PCA.

answered May 23 '16 at 22:34

Peter Flom

94,055
35
143
276

5

Might say instead that the data isn't distributed on a low dimensional linear subspace (i.e. compressible by PCA). It doesn't rule out that the data could have low intrinsic dimensionality in other ways (e.g. sparse distribution or distribution on a low dimensional nonlinear manifold) – user20160 May 23 '16 at 23:19
Thanks for your answer. Can you please explain what should be the expected percentage of the principal components to reach 95% variance, typically? For example, if PCA returns N principal components, how many of those N would it take, typically, to reach 95% variance? – stackoverflowuser2010 Jun 01 '16 at 17:45
There really isn't a "typical". It depends on the field of study and the particular subject of study. You have to figure it out on your own. – Peter Flom Jun 03 '16 at 11:34
Can you please explain a little more about what it means for the data to be not very compressible? Does it mean the original data has few or many correlated features? Does it mean the original data is sparsely distributed or clumped together in the original feature space? – stackoverflowuser2010 Jun 04 '16 at 05:54
It means that there is no simple structure that is extractable from the data via factor analysis. You can look at the correlation matrix and a scatterplot matrix to get an idea of what is going on in your particular case. – Peter Flom Jun 04 '16 at 12:17

Interpreting PCA results when a lot of PCs are needed to reach 95% of variance

1 Answers1