2

This is my first post so apologies for any incorrect formatting or whether this has been answered elsewhere but I seem to be going around in circles.

Basically, I have 12 survey plots and have recorded 17 variables for each; # of tree species, # of dead trees, % grass cover etc. I have 2 years worth of this data (2016, 2017) so would be running each separately. The plan is to use PCA in R to reduce the number of variables by using the component scores from the top principle components instead. I ran PCA using PRCOMP as follows:

dframe1 <- read.csv('g:/veg.csv', header=TRUE)
PCA.results <- prcomp(dframe1, center = TRUE, scale. = TRUE)

The first 5 PC's have eignenvalues >1 so I obtained the component scores for these (abbreviated results shown):

PCA.results$x


          PC1         PC2       PC3          PC4         PC5 
[1,]    -2.7329607  -0.3238917  -1.2887333   0.15997834  1.0115736
[2,]    -0.4176688  -2.6465327  -2.4567818   1.17885072  0.130746
[3,]    -0.2304915  -1.8657283  -0.4056321  -0.12534494 -1.6435601
[4,]    -4.2221891   1.860162    0.5397799  -0.19361945 -1.2656926
[5,]    -3.0834      1.7658483  -0.1064903  -1.02139467  0.9627706

I have read about using rotations such a varimax or oblimin to better separate your components. To check which one to use, I have run an oblimin rotation to see if the any factors are above 0.32 in the correlation matrix.

library(psych)
library(GPArotation)
my.oblimin <- fa(PCA.results$rotation, nfactors=5, rotate = "oblimin")

None of them were, so it looks as if orthogonal (varimax) rotation should be okay to use although I did get an error message for one year, the other didn't have any errors, that said "The estimated weights for the factor scores are probably incorrect. Try a different factor extraction method."

If I do decide to run a varimax rotation on my PCA results, how do I then get the new component scores? It doesn't seem possible to specify a rotation type in PRCOMP so you have to run the rotation afterwards. This will only give you the rotated loadings, i.e. variables against PCs but no component scores, whether I use prcomp or fa to perform the rotation:

r.varimax <- varimax(PCA.results$rotation[,1:5])

or

fa.varimax <- fa(PCA.results$rotation, nfactors=5, rotate = "varimax")

So, my questions are:

  1. Does any/none/all of this seem reasonable?

  2. Is it possible to obtain new component scores for my 12 sites using varimax rotated PC values and would you want to?

  3. What does error message about estimated weights refer to?

If the answer to 1 or 2 is No, and I shouldn't be running varimax after PCA if I'm interested in component scores, then question 3 is moot really.

Sorry if any of this isn't clear or I'm totally off target. Any help with this would be appreciated. Thanks, Rich

amoeba
  • 93,463
  • 28
  • 275
  • 317
Rich_b
  • 21
  • 2
  • I vote to close as a dup. If your Q remains different after you carefully study the linked thread, please edit to focus your Q on the remaining issues. – amoeba Oct 09 '17 at 10:22
  • @amoeba Thanks, wasn't sure this was the same as I wanted. Could you please confirm, to find rotated component scores for first 5 PCs for dataset of 12 obs. for 17 var. your 2nd example would be: mydataX – Rich_b Oct 10 '17 at 10:40
  • Well, you can just copy-paste, so what's there to confirm? There are three methods given there (for some reason you chose the most complicated), so you can run all three and check if you get the same result. – amoeba Oct 10 '17 at 10:56
  • I did copy your code then adjusted for my dataset (17 col. instead of 4 as in iris dataset, so 1:17 instead of 1:4?). I get same results for 2 & 3 but an error & different result for 1. Error is > Warning messages: > 1: In cor.smooth(model) : > Matrix was not positive definite, smoothing was done > 2: In cor.smooth(r) : Matrix was not positive definite, smoothing was done > 3: In psych::principal(irisX, rotate = "varimax", nfactors = ncomp, : > The matrix is not positive semi-definite, scores found from Structure loadings Sorry if this is obvious, my R is pretty basic. – Rich_b Oct 10 '17 at 12:00
  • I don't know why `pych::principal` does not work correctly; I'd say you don't need to use it for something so basic. Just use method #3. Regarding `mydataX – amoeba Oct 10 '17 at 12:21
  • @amoeba - Thanks for clarifying that, I wasn't familiar with iris until today so now see why you use 1:4. Is it best to specify final number of PCs I intend to use (5) or get scores for more. e.g. 12 and then use top 5? I get slightly different values for ncomp=5 and 12, no surprise but not sure which is best? Thanks again for your help and the code, this has been driving me mad. – Rich_b Oct 10 '17 at 12:32
  • The idea behind varimax is that you are only rotating PCs that you consider "significant". So if it's 5 then you should rotate only 5. – amoeba Oct 10 '17 at 12:34
  • 1
    @amoeba Okay, brilliant. Thanks for your patience going through this, please feel free to close this post now. – Rich_b Oct 10 '17 at 23:13

0 Answers0