0

I am using a set of different variables of a dataset that have been used to test the cognitive ability of each individual. However, I would like to synthesize these variables into a single one representing the individual cognitive ability. I have 5 variables, three of them are factors, and two numerical. I used the following code:

res.pca<-prcomp(~c_cgwri_dv+c_cgwrd_dv+ c_cgs7ca_dv + 
numeric_score+c_cgvfc_dv+c_cgvfw_dv, d=final_sub, scale=T, na.action=na.omit)

with the following summary:

                    PC1         PC2        PC3        PC4         PC5
word_recall       0.5336195 -0.3624642  0.2596634 -0.02757730  0.71811169
delayed_word      0.5206455 -0.3968069  0.2963803 -0.02059078 -0.69513114
subtract_7        0.2880236  0.7358327  0.4205596 -0.44562664 -0.01180195
numeric_score     0.4254692  0.4107697 -0.1770526  0.78656437 -0.01459938
verbal_fluence    0.4244960  0.0313470 -0.7978193 -0.42608556 -0.02749300

Importance of components:

                        PC1    PC2     PC3     PC4      PC5
Standard deviation     1.543  1.0122  0.8504  0.7771   0.51824
Proportion of Variance 0.476  0.2049  0.1446  0.1208   0.05372
Cumulative Proportion  0.476  0.6809  0.8255  0.9463   1.00000

The first three variables are factors (n. of correct items in a scale 0-10), the numeric score and verbal fluency are continuous variables.

I have two questions:

  1. looking at the scores on PC1 (res.pca$x), I observe negative ones. If I have all positive loadings on PC1, this means that all the variables contribute positively to that component. Hence, people scoring higher in these variables should have higher score. Being the PCA based on centered variables, does this mean that if PC1 is "cognitive ability", the negatives scores are those individuals with cognitive ability "below the mean"?

  2. If my purpose is to obtain one single variable that represents the "cognitive ability" of the individual that has increasing positive values, can I simply create it as the sum (or better the average) of the scores of the first two components and scale it in a 0-1 range? (to justify the 2 components only I also perform a Factor analysis which reports very similar loadings and the eigenvalues are >=1 for the first 2 only).

Thank you for your help

  • Signs of the loadings (rotation) are essentially [arbitrary](https://stats.stackexchange.com/questions/30348/is-it-acceptable-to-reverse-a-sign-of-a-principal-component-score). – runr Nov 14 '19 at 15:46
  • So, this means that I can simply average the PC1 and PC2 and scale it in a 0-1 to obtain a single measure of cognitive ability? – Luca Giangregorio Nov 14 '19 at 15:52
  • Conventionally scores are reported with zero mean, so positive and negative deviations from the mean are what you see. To say that PC1 _is_ cognitive ability is reification, and results in apoplexy on the part of many scholars, scientists and statisticians. To wonder whether PC1 might be a useful single measure of cognitive ability is closer to the mark, but all depends on your inputs. PCA is not a washing machine; any dirt isn't removed, but is just smeared around the components. – Nick Cox Nov 14 '19 at 16:44
  • Thanks for your answer @NickCox and I am agree that it may result in apoplexy, that's why I was using conditional statement. However, the survey I am using has a section named "cognitive ability" presenting different variables that should measure it. In the PCA just performed, I just used these variables to synthesize them in one unique "cognitive ability" variable. Given this situation and the eigenvalues suggesting 2 components, would be appropriate to use the average of the scores of PC1 and PC2? – Luca Giangregorio Nov 14 '19 at 17:07
  • Absolutely not. The average of PC1 and PC2 cannot improve on PC1 as the best single summary in PCA terms and indeed has no rationale at all in terms of PCA. Think of this way: PC1 and PC2 are by construction (or by definition) uncorrelated so why would you want to average things that are uncorrelated? Other way round, if your results are showing that two dimensions are important then the path to follow is to use both. – Nick Cox Nov 14 '19 at 18:11
  • Thank you again, you have been perfectly clear. – Luca Giangregorio Nov 14 '19 at 22:01

0 Answers0