4

[This question is modified based on suggestion from @ttnphns]

I am doing linear principal component analysis (PCA) based on polychoric correlations between the variables (rather than on native Pearson correlations between them). I want to compute component scores from my analysis.

I found that the component scores calculated from orthogonal rotation (i.e. with the help of eigenvectors) based on polychoric correlations are correlated (I expect components should not correlate!).

After discussing with @ttnphns, it seems that using the eigenvectors or the loadings extracted from polychoric (or tetrachoric) correlations cannot give me proper component scores - when used the usual way as it is done in PCA. Because those correlations are inferred correlations and are not Pearson correlations directly computable from the variables. There is no "direct connection" on the level of individual values between my ordinal variables and the polychoric correlations. That makes computation of proper component scores a problem. So how to compute the component scores?

The code of my analysis in R:

  1. Save the polychoric correlation into polycorC1

    polycorC1<- polychoric(C1C_fa_[,c(1:3,5:14)])

  2. Use the polycorC1 to run a PCA using varimax (factor = 2 as recommended by parallel analysis)

    C1Cpca <- principal(polycorC1$rho, nfactor= 2, n.obs=nrow(na.omit(C1C_fa_)) , rotate='varimax', residuals=TRUE)

  3. Calculated the component scores using matrix multiplication with the loadings from the PCA

    C1C_fa_Com<- as.matrix(scale(C1C_fa_[,c(1:3,5:14)]))%*%t(pracma::pinv(C1Cpca$loadings[,1:2])

The below are the data sample I used

> polycorC1
Call: polychoric(x = C1C_fa_[, c(1:3, 5:14)])
Polychoric correlations 
             C01 C02 C03 C05 C06 C07 C08 C09 C10 C11 C12 C13 C14
C01  1.00                                                                                                  
C02  0.84    1.00                                                                                          
C03  0.76    0.85    1.00                                                                                  
C05  0.56    0.65    0.69    1.00                                                                          
C06 -0.30   -0.20   -0.10   -0.05    1.00                                                                  
C07 -0.35   -0.41   -0.42   -0.48    0.35    1.00                                                          
C08  0.46    0.48    0.43    0.30   -0.55   -0.65    1.00                                                  
C09 -0.16    0.12    0.18   -0.23    0.58    0.49   -0.46    1.00                                          
C10 -0.28   -0.11   -0.04   -0.11    0.63    0.53   -0.61    0.69    1.00                                  
C11 -0.24   -0.23   -0.20   -0.28    0.45    0.55   -0.66    0.53    0.54    1.00                          
C12 -0.21   -0.12   -0.11   -0.13    0.52    0.61   -0.64    0.66    0.66    0.54    1.00                  
C13 -0.25   -0.39   -0.42   -0.43    0.32    0.65   -0.61    0.41    0.43    0.47    0.55    1.00          
C14 -0.49   -0.54   -0.51   -0.54    0.25    0.49   -0.42    0.33    0.34    0.53    0.30    0.52    1.00  

 with tau of 
                  1      2    3    4
C01 -0.568 -0.205 0.94 1.52
C02  0.044  0.205 1.26 1.74
C03  0.176  0.205 1.68 2.11
C05 -0.621 -0.250 1.81 2.27
C06 -1.991 -0.638 1.19 1.19
C07 -1.743  0.468 0.98  Inf
C08 -0.871  0.058 1.36  Inf
C09 -1.680 -0.058 1.03 1.03
C10 -1.813 -1.007 0.98 0.98
C11 -1.991 -1.224 0.42 0.64
C12 -2.110 -1.031 0.50 0.83
C13 -2.523 -1.322 0.33 0.45
C14 -2.110 -1.571 0.47 0.59
  > head(C1C_fa_[,c(1:3,5:14)])  
  C01 C02 C03 C05 C06 C07 C08 C09 C10
1            3            3            3            2            2            3            3            3            3
2            1            1            1            1            2            2            4            2            2
3            3            3            4            3            3            3            1            3            5
4            1            1            1            1            5            5            1            5            5
5            3            1            1            3            2            3            4            1            1
6            2            1            1            2            3            4            2            3            3
  C11 C12 C13 C14
1            2            2            3            3
2            2            2            3            5
3            5            5            4            3
4            5            5            5            5
5            3            2            2            2
6            5            5            5            5
> C1Cpca$loadings[,1:2]
                     RC1         RC2
C01 -0.16661714  0.81706044
C02 -0.03981366  0.93794572
C03  0.01248097  0.94517952
C05 -0.12835191  0.78302564
C06  0.71467993 -0.07219318
C07  0.66413675 -0.45425580
C08 -0.72581724  0.43025104
C09  0.85410423  0.10239328
C10  0.85163361 -0.04177471
C11  0.72659138 -0.24209655
C12  0.84231783 -0.07579838
C13  0.59548110 -0.44222574
C14  0.39604549 -0.62564654
> head(C1C_fa_Com)
  C_RC1 C_RC2
1   11.635377   5.7926567
2    9.848028  -0.8181452
3   20.691296   5.1087066
4   27.176833  -5.3445258
5    7.159462   3.9442215
6   20.651074  -2.8367836
ceoec
  • 524
  • 1
  • 4
  • 16
  • You must be doing something wrong. How are you computing rotated scores? – amoeba Apr 12 '15 at 22:18
  • @amoeba, I just multiply using matrix multiplication in the way I describe here: http://stats.stackexchange.com/questions/144841/compute-component-scores-from-principalloadings-directly-in-r I calculated manually to make sure the computing is correct... – ceoec Apr 13 '15 at 04:42
  • 4
    Polychoric correlations are inferred Pearson correlations; they are the correlations of the underlying variables which values you _have no_. What you have - is the observed ordinal variables. Multiplying them with the eigenvectors extracted from those correlations _won't_ give you the component scores you expect. – ttnphns Apr 13 '15 at 06:14
  • @ttnphns, thanks! then is it still valid for me to calculate the component scores in this manner? – ceoec Apr 13 '15 at 06:16
  • 2
    I doubt it. Either use usual correlations and then you can do it (provided that when you multiply, the variables are in standardized form). Or do some tricky form of estimation (actually, I don't know which - will need to look for a literature) if you insist on using polychoric correlations. – ttnphns Apr 13 '15 at 06:21
  • 1
    BTW, it is a good and interesting _question_ and you might want to post it here. "How to compute component or factor scores when the analysis is based on polychoric/tetrachoric correlations?" – ttnphns Apr 13 '15 at 06:25
  • @ttnphns, edit the question thanks for your suggestion! – ceoec Apr 13 '15 at 06:32
  • Isn't this question a duplicate of your earlier question, which [I have already answered](http://stats.stackexchange.com/a/144866/31372)? If not, could you clarify what are the differences? – Aleksandr Blekh Apr 13 '15 at 06:39
  • @AleksandrBlekh Seems it is not correct to calculate component scores using the loadings when polychoric correlation is used instead of pearson based on the discussion with ttnphns... – ceoec Apr 13 '15 at 06:42
  • I read the discussion above, but it still is not clear to me. Perhaps, @ttnphns could clarify it a little more or maybe post an answer. – Aleksandr Blekh Apr 13 '15 at 06:49
  • @AleksandrBlekh, I think this is my problem -- I tried to use weight in principal to calculate the components scores instead of loadings, now there are 0 correlations. So it seems I should use weight instead of loadings.... I am so confused now... – ceoec Apr 13 '15 at 06:57
  • @ttnphns, thanks for the edit! I try to use weight instead of loading and now the scores are not correlated. It seems using weight would be the answer but I do not know why... and this leads to the question on what I should report for my PCA: the loadings or weight? – ceoec Apr 13 '15 at 07:01
  • 1
    @ceoec. I dared to edit the question considerably. Please revise it now. Did I ask correctly what you wanted to know? – ttnphns Apr 13 '15 at 07:01
  • @ceoec, Can you comment your code? Some people (including me) are not R users. They would want to know what you are doing with your code. – ttnphns Apr 13 '15 at 07:08
  • @ttnphns, I now edit the questions... but I figure out using "weight" instead of "loading" to calculate the scores would give me no correlation. I am now looking into why this is so... – ceoec Apr 13 '15 at 07:14
  • What are calling "weight?" Scores in PCA are computable either with the help of eigenvectors or (standardized scores) with the help of [score coefficient](http://stats.stackexchange.com/q/126885/3277) matrix derived from loadings. – ttnphns Apr 13 '15 at 07:22
  • @ttnphns the r manual says "weights = The beta weights to find the principal components from the data" but it means nothing to me. I was trying to read the code to see how the weights are calculate but failed to trace the computation.... – ceoec Apr 13 '15 at 07:28
  • It is not clear to me if your problem is due to polychoric nature of correlations (as @ttnphns suggests) or due to the varimax rotation procedure (that you apply manually -- which is dangerous because it is easy to screw up); or both. You could test it by computing scores without varimax rotation. If they come out orthogonal than the problem might have nothing to do with polychoric correlations. In fact, to compute varimax-rotated scores, you seem to multiply raw data with varimax-rotated loadings. If so, this is wrong, [see my answer here](http://stats.stackexchange.com/a/137003/28666). – amoeba Apr 13 '15 at 09:38
  • @amoeba, yes I am multiplying raw data with varimax-rotated loadings! I am happy to know this is wrong. I have tried to read your link several times but I do not get what is wrong... Do you mean I need to scale() my data first? Can you further elaborate? And sorry I do not get what you mean by "apply varimax manually", how should I apply that? – ceoec Apr 13 '15 at 10:06
  • @amoeba, I tried to used weights instead of loading, the results are orthogonal; I also tried factor.scores, and the results are orthogonal too. Yet the scores are very different. I found there is another set of weight in factor.scores and the "loading" doesn't make any senses in terms of the factor structure. Yet the output scores do make sense. The whole thing is so confusing.... :( Do you know why? – ceoec Apr 13 '15 at 10:28
  • 1
    Wait. The problem is that your question currently confuses two different issues: one is about polychoric correlations and another is about varimax rotations. I think you first need to figure out how to get PCA scores with polychoric correlations *without* any varimax rotations. Do you know how to do it? How exactly do you do that? Do you obtain uncorrelated scores? – amoeba Apr 13 '15 at 12:58
  • @amoeba, this is the way I get the polychoric correlations without varimax - p = 0.84 (not sig, but not equals 1 as I expected) : `polycorC1 – ceoec Apr 13 '15 at 13:09
  • Okay. First: so are the scores that you get like this (without varimax) correlated or not? Second: this is a wrong formula to compute the scores, i.e. if you use usual Pearson correlation matrix you will not get PCA scores with this formula, the scaling will be wrong. You can get standardized scores by multiplying (in the last line) by `t(pracma::pinv(loadings))` instead of `loadings`. For unrotated PCA this will only change the scaling, so whether scores are correlated or not, will stay the same. – amoeba Apr 13 '15 at 13:24
  • @amoeba, thanks for your explanation. The correlations of the two scores I obtained was r(147)=-.01, p=.84, so it is not significantly correlated, yet p is not 1 as i expected. – ceoec Apr 13 '15 at 13:38
  • I see. Now we can turn to the varimax-rotated case: what correlation do you get there when you use the correct formula (with pseudoinverse of the loadings)? And for comparison, what correlation did you get when you used your formula (which I claim to be incorrect)? – amoeba Apr 13 '15 at 13:42
  • @amoeba, the one you suggested I got r very similar to the unrotated one: r(147) = -.01, p= .87, the incorrect one is r(147)=.32, p<.001> – ceoec Apr 13 '15 at 13:51
  • And when you use `weights` that you get out of `principal` function (with varimax rotation)? – amoeba Apr 13 '15 at 13:52
  • @amoeba, oh when I use weight it is r(147)=.01,p=.89 – ceoec Apr 13 '15 at 13:55
  • @amoeba, but when I use factor.scores, the r is very close to 0 with p = 1. – ceoec Apr 13 '15 at 13:58
  • so it seems using weight, the pseudoinverse of the loadings and factor scores are all getting different results... – ceoec Apr 13 '15 at 13:59
  • 1
    Okay, thanks, now I can summarize what is happening here. Your main problem was that you used a wrong formula to compute scores. You cannot multiply data with loadings, you need to multiply it with inverse loadings. The `weights` that the function `principal` returns are (probably) inverse loadings, so you can use those if you like, it's the same thing. If you do this, then with and without varimax you get correlation close to zero. It's not exactly zero due to polychoric correlations, as @ttnphns noted. But your "main" problem has nothing to do with polychoric issue. – amoeba Apr 13 '15 at 14:00
  • 1
    Hmmm, I don't know what exactly `factor.scores` is doing. I would think that it should apply `weights` and that `weights` should be pseudo-inverse of the loadings, so all three methods should be identical. Unfortunately, the `psych` package is not very extensively documented, so I cannot tell you what exactly is going on there. The differences seem to be small though. – amoeba Apr 13 '15 at 14:02
  • @amoeba, thanks so much for your guidance along solving my stupid problem. :) – ceoec Apr 13 '15 at 14:06
  • 1
    No problem. Unfortunately, the question is now super-confusing... To make it useful, you should maybe edit it. If you want it to be about computing scores with polychoric correlations, as per @ttnphns's suggestion, please remove all varimax stuff from it it, and describe that you get almost zero but not exactly zero correlations. If you want it to be about your original problem, then edit it accordingly away from polychoric, but then it might be closed as a duplicate of another varimax thread. Up to you, but think about it. – amoeba Apr 13 '15 at 14:11
  • For future reference -- factor.scores using method "Harman" will results in the same scores using weight to calculate. I will modify the question later... – ceoec Apr 13 '15 at 17:54
  • 1
    @ceoec, I did not follow the discussion. So I'm back only to remark on the issue of computing scores for inferred corelations such as polychoric. Classic method won't do properly. You have to use special estimations such as expected or maximum a posteriori (EAP, MAP). They exist in specialized factor analytic packages such as M-Plus. – ttnphns Apr 13 '15 at 20:00

0 Answers0