Correlated component scores after PCA with varimax rotation in Stata

Question

I have a dataset with 6 personality traits for 155 individuals that are highly correlated. To get rid of multicollinearity (and potential noise in the original variables) in my regression analysis, I thought of first running PCA, since some of the 6 variables can "theoretically" be aggregated into 3 constructs.

In an attempt to get uncorrelated component scores, I followed this post and mean-center my original variables:

                Obs    Mean      Std. Dev.     Min         Max

q3_avtrustfac   155 4.18e-10    .4014286    -1.101315   .8117283
q3_avcompefac   155 -5.62e-09   .4458447    -1.391796   .8011859
q3_avatrfac     155 3.44e-09    .5891259    -1.773446   1.676078
q3_avdomfac     155 -7.24e-10   .5023187    -1.689295   1.119483
q3_avpassfac    155 -4.13e-09   .6270763    -1.611541   1.34176
q3_avopenfac    155 -6.80e-09   .652431     -1.90456    1.281211
q3_avtrust      155 4.49262     .4014286    3.391304    5.304348
q3_avcompe      155 4.72513     .4458447    3.333333    5.526316
q3_avatr        155 3.963922    .5891259    2.190476    5.64
q3_avdom        155 4.052931    .5023187    2.363636    5.172414                        
q3_avpass       155 4.176758    .6270763    2.565217    5.518518
q3_avopen       155 4.631832    .652431     2.727273    5.913043

The first six variables are mean-centered and the latter 6 are the original 7-Likert scale original variables. Next, I run the PCA Stata commands (requiring 3 components), using varimax rotation and retrieving the predicted scores:

pca q3_avtrustfac q3_avcompefac q3_avatrfac q3_avdomfac q3_avpassfac q3_avopenfac, comp(3)
rotate, varimax blanks(.3)
predict pc1 pc2 pc3, score 
corr pc1 pc2 pc3

And rerun the above code with the original set of 6 variables. Both outputs lead to the exact same correlation table:

    pc1       pc2    pc3        
pc1 1.0000
pc2 0.7262  1.0000
pc3 0.4553  0.5339  1.0000

What I can't understand is why after mean-centering my data (i.e. variables ending with "fac" in the descriptives table above), I still get correlated component scores.

It seems that this is not a bug but is expected. I found [this](https://www.stata.com/statalist/archive/2009-12/msg00808.html) thread on the topic that includes a reference (Rencher). — COOLSerdash, May 03 '19 at 09:21

score 1 · Answer 1 · answered May 03 '19 at 09:32

Here a relevant excerpt found on page 429 in chapter 12 "Principal component analysis" from the book "Methods of multivariate analysis" by AC Rencher and WF Christensen (3rd ed.):

The authors seem to confirm your observation. Here is a small example in Stata just to check that the unrotated scores are in fact uncorrelated (some output is omitted):

sysuse auto
pca trunk weight length headroom
predict f1-f4, score
corr f1-f4

(obs=74)

             |       f1       f2       f3       f4
-------------+------------------------------------
          f1 |   1.0000
          f2 |   0.0000   1.0000
          f3 |   0.0000  -0.0000   1.0000
          f4 |   0.0000   0.0000   0.0000   1.0000

All correlation coefficients between the scores are virtually zero, as desired.

This quote does not make sense to me. After varimax rotation, PCA or FA scores (if computed correctly) should remain uncorrelated. Mathematical details are in my answer here https://stats.stackexchange.com/questions/612. — amoeba, May 07 '19 at 09:22

score 0 · Accepted Answer · answered May 07 '19 at 08:59

@COOLSerdash pointed to a source (reading) that describes the phenomenon. After some other readings in here and here, I learnt about the Anderson-Rubin scores, which give uncorrelated scores. As my concern was multicollinearity among components in my post pca estimations (i.e. regressions), I applied them with R (couldn't see them in Stata).

I used the principal R function, requiring for 3 factors, varimax rotation, and afterwards I estimated factor scores with the Anderson method:

x1=principal(df, nfactors=3, rotate="varimax")
f1=factor.scores(df, x1, method="Anderson")

Hope it can be helpful.

This does not make sense to me. After varimax rotation, PCA or FA scores (if computed correctly) should remain uncorrelated. Mathematical details are in my answer here https://stats.stackexchange.com/questions/612. You can try it in R: `x1$scores` should be uncorrelated, so you don't need your second line of code. See also https://stats.stackexchange.com/questions/59213. — amoeba, May 07 '19 at 09:23

Correlated component scores after PCA with varimax rotation in Stata

2 Answers2