1

I have a dataset with 6 personality traits for 155 individuals that are highly correlated. To get rid of multicollinearity (and potential noise in the original variables) in my regression analysis, I thought of first running PCA, since some of the 6 variables can "theoretically" be aggregated into 3 constructs.

In an attempt to get uncorrelated component scores, I followed this post and mean-center my original variables:

                Obs    Mean      Std. Dev.     Min         Max

q3_avtrustfac   155 4.18e-10    .4014286    -1.101315   .8117283
q3_avcompefac   155 -5.62e-09   .4458447    -1.391796   .8011859
q3_avatrfac     155 3.44e-09    .5891259    -1.773446   1.676078
q3_avdomfac     155 -7.24e-10   .5023187    -1.689295   1.119483
q3_avpassfac    155 -4.13e-09   .6270763    -1.611541   1.34176
q3_avopenfac    155 -6.80e-09   .652431     -1.90456    1.281211
q3_avtrust      155 4.49262     .4014286    3.391304    5.304348
q3_avcompe      155 4.72513     .4458447    3.333333    5.526316
q3_avatr        155 3.963922    .5891259    2.190476    5.64
q3_avdom        155 4.052931    .5023187    2.363636    5.172414                        
q3_avpass       155 4.176758    .6270763    2.565217    5.518518
q3_avopen       155 4.631832    .652431     2.727273    5.913043

The first six variables are mean-centered and the latter 6 are the original 7-Likert scale original variables. Next, I run the PCA Stata commands (requiring 3 components), using varimax rotation and retrieving the predicted scores:

pca q3_avtrustfac q3_avcompefac q3_avatrfac q3_avdomfac q3_avpassfac q3_avopenfac, comp(3)
rotate, varimax blanks(.3)
predict pc1 pc2 pc3, score 
corr pc1 pc2 pc3

And rerun the above code with the original set of 6 variables. Both outputs lead to the exact same correlation table:

    pc1       pc2    pc3        
pc1 1.0000
pc2 0.7262  1.0000
pc3 0.4553  0.5339  1.0000

What I can't understand is why after mean-centering my data (i.e. variables ending with "fac" in the descriptives table above), I still get correlated component scores.

amoeba
  • 93,463
  • 28
  • 275
  • 317
martins
  • 111
  • 2
  • It seems that this is not a bug but is expected. I found [this](https://www.stata.com/statalist/archive/2009-12/msg00808.html) thread on the topic that includes a reference (Rencher). – COOLSerdash May 03 '19 at 09:21
  • @COOLSerdash Thanks for your comments – martins May 03 '19 at 10:58

2 Answers2

1

Here a relevant excerpt found on page 429 in chapter 12 "Principal component analysis" from the book "Methods of multivariate analysis" by AC Rencher and WF Christensen (3rd ed.):

Page429

The authors seem to confirm your observation. Here is a small example in Stata just to check that the unrotated scores are in fact uncorrelated (some output is omitted):

sysuse auto
pca trunk weight length headroom
predict f1-f4, score
corr f1-f4

(obs=74)

             |       f1       f2       f3       f4
-------------+------------------------------------
          f1 |   1.0000
          f2 |   0.0000   1.0000
          f3 |   0.0000  -0.0000   1.0000
          f4 |   0.0000   0.0000   0.0000   1.0000

All correlation coefficients between the scores are virtually zero, as desired.

COOLSerdash
  • 25,317
  • 8
  • 73
  • 123
  • This quote does not make sense to me. After varimax rotation, PCA or FA scores (if computed correctly) should remain uncorrelated. Mathematical details are in my answer here https://stats.stackexchange.com/questions/612. – amoeba May 07 '19 at 09:22
0

@COOLSerdash pointed to a source (reading) that describes the phenomenon. After some other readings in here and here, I learnt about the Anderson-Rubin scores, which give uncorrelated scores. As my concern was multicollinearity among components in my post pca estimations (i.e. regressions), I applied them with R (couldn't see them in Stata).

I used the principal R function, requiring for 3 factors, varimax rotation, and afterwards I estimated factor scores with the Anderson method:

x1=principal(df, nfactors=3, rotate="varimax")
f1=factor.scores(df, x1, method="Anderson")

Hope it can be helpful.

martins
  • 111
  • 2
  • This does not make sense to me. After varimax rotation, PCA or FA scores (if computed correctly) should remain uncorrelated. Mathematical details are in my answer here https://stats.stackexchange.com/questions/612. You can try it in R: `x1$scores` should be uncorrelated, so you don't need your second line of code. See also https://stats.stackexchange.com/questions/59213. – amoeba May 07 '19 at 09:23