I am trying to reduce a high dimensional data using FactomineR. I created a training and test data set and did a PCA.
`bound <- floor((nrow(GK2)/4)*3) #define % of training and test set
GK2 <- GK2[sample(nrow(GK2)), ] #sample rows
GK2.train <- GK2[1:bound, ] #get training set
GK2.test <- GK2[(bound+1):nrow(GK2), ] #get test set
GKPCA<-PCA(GK2.train)`
GKPCA$eig
eigenvalue percentage of variance cumulative percentage of variance
comp 1 52.259733827 57.428278931 57.42828
comp 2 7.152528027 7.859920909 65.28820
comp 3 5.099126890 5.603436143 70.89164
comp 4 4.064143884 4.466092181 75.35773
comp 5 3.600750943 3.956869169 79.31460
comp 6 3.138452260 3.448848637 82.76345
comp 7 2.894380868 3.180638316 85.94408
comp 8 2.287930806 2.514209677 88.45829
comp 9 1.971793852 2.166806431 90.62510
comp 10 1.572952777 1.728519535 92.35362
comp 11 1.435328251 1.577283792 93.93090
comp 12 1.173586263 1.289655234 95.22056
comp 13 1.066540121 1.172022111 96.39258
comp 14 0.727976218 0.799973866 97.19255
comp 15 0.580969070 0.638427550 97.83098
comp 16 0.554111316 0.608913534 98.43990
From my understanding, the first 5/6 components are the variables that "matter" the most. I know this is a silly question, but how do I get the names of these (comp1, comp2, etc..? If my understanding please let me know. Or any other suggestions on how I would reduce the data (to find the most 'important' variables) which I will then use to do a cluster analysis.
Some notes from data
- It includes many 0's and thus Inf when PCA is done.
The data set is goal keeper (player) performance (it was match by match but I aggregrate it to make more sense)
There are many variables compared to obervations (40 obs 196
Variables)GKPCA$loadings=null
couldnt do princomp function