1

I wanted to confirm my intution about the meaning of "loadings" I have made out of eigenvalue/eigenvector decomposition, but I still fail to do that.

Note that I made 3 pairs of highly correlated variables, expecting 3 'significant' principal components, and it works as expected.

However, when I build the loadings I get a plot that does not show me three distinct correlation peaks.

Why don't I get a clear picture showing variable clustering in terms of correlation with the scaled components (let them call factors loosely)?

Can I do different analysis to have what I want (prcomp / factanal)?

enter image description here

set.seed(1)

x1 = rnorm(1000)
x2 = x1 + rnorm(1000, 0, 0.2)

x3 = rnorm(1000)
x4 = x3 + rnorm(1000, 0, 0.2)

x5 = rnorm(1000)
x6 = x5 + rnorm(1000, 0, 0.2)

dt <- data.frame(cbind(x1,x2,x3,x4,x5,x6))

A <- as.matrix(dt)

ei <- eigen(cov(A))

eivals <- ei$values

plot(eivals)

eivecs <- ei$vectors

loading_1 <- eivecs[,1] * sqrt(eivals[1]/sum(eivals))
loading_2 <- eivecs[,2] * sqrt(eivals[2]/sum(eivals))
loading_3 <- eivecs[,3] * sqrt(eivals[3]/sum(eivals))

plot(loading_1, type = 'l', ylim = c(-1,1), ylab = 'loadings', xlab = 'variables'); lines(loading_2, col = 'red'); lines(loading_3, col = 'blue'); axis(1, at = 1:6, labels = paste0('x', 1:6))

Update (2019-10-07)

I have obtained a much cleaner picture here after rotation.

My question transforms now to:

whether I can take (for example) modulus of these loading coefficients to be able to treat the figures as correlation coefficients of variables with factors? I cannot describe neither to colleagues nor to myself what the negative relation means.

I try to do topic modelling, where variables are word frequencies, by the way. So said, I would love to say that this topic (factor) is dominated by Russia (0.7), politics (0.71), and another topic by Economy (0.72), Oil (0.69).

enter image description here

## PCA + Varimax

set.seed(1)

x1 = rnorm(1000)
x2 = x1 + rnorm(1000, 0, 0.2)

x3 = rnorm(1000)
x4 = x3 + rnorm(1000, 0, 0.2)

x5 = rnorm(1000)
x6 = x5 + rnorm(1000, 0, 0.2)

dt <- data.frame(cbind(x1,x2,x3,x4,x5,x6))

M <- as.matrix(dt)

prc <- prcomp(M)

loading_1 <- prc$rotation[,1]
loading_2 <- prc$rotation[,2]
loading_3 <- prc$rotation[,3]

plot(loading_1, type = 'l', ylim = c(-1,1), ylab = 'loadings', xlab = 'variables'); 
lines(loading_2, col = 'red'); 
lines(loading_3, col = 'blue'); 
axis(1, at = 1:6, labels = rep('', 6));
axis(1, at = 1:6, labels = paste0('x', 1:6))


## Varimax

L <- prc$rotation[,1:3]

R <- varimax(L, normalize = TRUE, eps = 1e-5)

R$loadings

loading_1 <- R$loadings[,1]
loading_2 <- R$loadings[,2]
loading_3 <- R$loadings[,3]

plot(loading_1, type = 'l', ylim = c(-1,1), ylab = 'loadings', xlab = 'variables'); 
lines(loading_2, col = 'red'); 
lines(loading_3, col = 'blue'); 
axis(1, at = 1:6, labels = rep('', 6));
axis(1, at = 1:6, labels = paste0('x', 1:6))
Alexey Burnakov
  • 2,469
  • 11
  • 23
  • 2
    Why, I see quite clear picture on your loading profile plot. (X1,X2) vs (X3,X4) vs (X5,X6). The middle pair is weakly loaded by any of the three PCs, therefore we can't say for sure if X3,X4 are correlated or not. But that is because the solution is unrotated. Try to perform varimax or other rotation to chase for correlation clusters. – ttnphns Oct 02 '19 at 12:30
  • Thank you, I will this with the rotation and update the plot. I expected three distinct peaks just for PCs loadings, because intuitively there are clear factors, but I find it hard to imagine 6D image of data. – Alexey Burnakov Oct 02 '19 at 17:08
  • @ttnphns, hello. I am getting confused with the following course of work on this task. "Try to perform varimax or other rotation...". I read a couple of highly rated posts here about the rotation, wrtote by respectable posters (@amoeba and other rated posters). In one post what is rotated are the **PCs** obtained by SVD decomposition, for example. In the other post rotated are **loadings** which can also be extracted from PCA. I also read that under specific contexts any of two can be done. What did you mean when referring to the rotation here? – Alexey Burnakov Oct 07 '19 at 10:52
  • @ttnphns, I updated my original post with a new plot and code, and also could you look at the updated question I have asked? – Alexey Burnakov Oct 07 '19 at 11:33
  • 1
    [Here is](https://stats.stackexchange.com/a/193023/3277) my answer with neat chart. Inspect it. We rotate both loadings and the (standardized) PC or factor scores with the same rotation matrix. The matrix of rotation is established via an algorithm such as varimax which aim is to rotate loadings to a more interpretable position. – ttnphns Oct 07 '19 at 12:41
  • @ttnphns, thank you. I read your answer, and the plot is really neat. With rotation I could clarify profiles of my factors, and, surprisingly, I also found good interpretation for my topic modelling, at least the topics (factors) were found to be quite informative. – Alexey Burnakov Oct 07 '19 at 14:17

0 Answers0