I am using a set of different variables of a dataset that have been used to test the cognitive ability of each individual. However, I would like to synthesize these variables into a single one representing the individual cognitive ability. I have 5 variables, three of them are factors, and two numerical. I used the following code:
res.pca<-prcomp(~c_cgwri_dv+c_cgwrd_dv+ c_cgs7ca_dv +
numeric_score+c_cgvfc_dv+c_cgvfw_dv, d=final_sub, scale=T, na.action=na.omit)
with the following summary:
PC1 PC2 PC3 PC4 PC5
word_recall 0.5336195 -0.3624642 0.2596634 -0.02757730 0.71811169
delayed_word 0.5206455 -0.3968069 0.2963803 -0.02059078 -0.69513114
subtract_7 0.2880236 0.7358327 0.4205596 -0.44562664 -0.01180195
numeric_score 0.4254692 0.4107697 -0.1770526 0.78656437 -0.01459938
verbal_fluence 0.4244960 0.0313470 -0.7978193 -0.42608556 -0.02749300
Importance of components:
PC1 PC2 PC3 PC4 PC5
Standard deviation 1.543 1.0122 0.8504 0.7771 0.51824
Proportion of Variance 0.476 0.2049 0.1446 0.1208 0.05372
Cumulative Proportion 0.476 0.6809 0.8255 0.9463 1.00000
The first three variables are factors (n. of correct items in a scale 0-10), the numeric score and verbal fluency are continuous variables.
I have two questions:
looking at the scores on PC1 (res.pca$x), I observe negative ones. If I have all positive loadings on PC1, this means that all the variables contribute positively to that component. Hence, people scoring higher in these variables should have higher score. Being the PCA based on centered variables, does this mean that if PC1 is "cognitive ability", the negatives scores are those individuals with cognitive ability "below the mean"?
If my purpose is to obtain one single variable that represents the "cognitive ability" of the individual that has increasing positive values, can I simply create it as the sum (or better the average) of the scores of the first two components and scale it in a 0-1 range? (to justify the 2 components only I also perform a Factor analysis which reports very similar loadings and the eigenvalues are >=1 for the first 2 only).
Thank you for your help