My understanding of a PCA is that its main purpose is to reduce dimensionality among variables, as a smaller set of PCs can explain the majority of variance otherwise attributable to the other variables.
As an example, in my interpretation, if there are 5 independent variables, it would be expected with no other information that they should each explain ~20% of the variance. If after a PC the observed variance explained is not equal to 20%, but maybe the first 2 PCs explain 90% of the variance in the data, in this case it would seem to clearly reduce dimensionality. However, I have a data set where PCs are much closer to ~20% each.
eigenvalue percentage of variance cumulative percentage of variance
Pcomp 1 1.3762857 27.60763 27.60763
Pcomp 2 1.1718536 23.50682 51.11446
Pcomp 3 0.9139234 18.33287 69.44733
Pcomp 4 0.8245694 16.54047 85.98780
Pcomp 5 0.6985312 14.01220 100.00000
At what point would you accept the PCs don't sufficiently reduce dimensionality and just stick with the original variables? And is there a test to support this decision?
Intuitively a chi-square goodness of fit test would make sense to me (n=5), of observed vs expected outcomes (E = 20%), but I think this breaks assumptions of chi-square tests by using percentages?
Is there a simpler way to go about this or an appropriate test to apply?
Var1 Var2 Var3 Var4 Var5
Var1 1.00 0.29 -0.11 -0.03 -0.07
Var2 0.29 1.00 -0.14 -0.03 0.00
Var3 -0.11 -0.14 1.00 -0.01 -0.06
Var4 -0.03 -0.03 -0.01 1.00 0.16
Var5 -0.07 0.00 -0.06 0.16 1.00
PC1 PC2 PC3 PC4 PC5
Var1 0.63 -0.05 -0.35 -0.03 -0.69
Var2 0.64 0.09 -0.19 -0.26 0.69
Var3 -0.40 -0.33 -0.72 -0.46 0.04
Var4 -0.13 0.63 -0.54 0.53 0.09
Var5 -0.10 0.70 0.15 -0.66 -0.20