I am trying to perform a varimax-style orthogonal rotation after doing FAMD (Factor analysis of Mixed Data). In this case, my immediate goal is to construct an index from four variables, one quantitative and three qualitative.
I am more or less aware of the math behind orthogonal rotation of the loadings in PCA (all quantitative variables). The following posts were helpful:
Is PCA followed by a rotation (such as varimax) still PCA?
How to compute varimax-rotated principal components in R?
Problem is, I am not familiar with how it works when there are categorical variables involved in the data, as in FAMD. I have two questions, which surely would come across as rather clueless to people who know this subject well:
- The only automated software solution for varimax-style rotation in FAMD that I know of is the R package 'PCAmixdata'. The thing is, whenever there is just ONE quantitative variable in the data (not zero, not two, not three etc.), the PCArot command gives an error message. I tried this with many datasets and it was the same every time.
One simple reproducible example:
library(PCAmixdata)
data(wine)
df <- wine[,c(1, 2, 31)] #one quant and two qual variables
split <- splitmix(df)
pca <- PCAmix(X.quanti=split$X.quanti,
X.quali=split$X.quali,
graph = F)
pca$eig #let's select 3 components
pca_rot <- PCArot(pca_df, dim=3, graph=F) ##error message
Error in `colnames<-`(`*tmp*`, value = paste("dim", 1:dim, sep = "", ".rot")) : attempt to set 'colnames' on an object with less than two dimensions
Is there any theoretical reason why rotation should not work when there is only one quantitative variable in the dataset? I can't think of any, but I would appreciate if someone more knowledgable than I could confirm (and potentially explain) this matter. Or perhaps is this just a simple bug in the PCArot command?
- In an effort to construct an index, I just did the same thing as I would in regular PCA with varimax rotation: get the standardized principal component scores by running an FAMD, compute the "rotation matrix" of the loadings, and transform the original scores with the rotation matrix. This corresponds to method 3 that the author 'amoeba' used in the second link above.
loadings <- sqrt(pca$sqload)[,1:3]
scores <- scale(pca$ind$coord[,1:3]) %*% varimax(loadings)$rotmat
It should also be possible to compute loadings using the FactoMineR package instead. This package does not have any built-in commands for calling the loading matrix, but it can be computed manually as is also explained in one of the package creator's online posts (links below). Once the loading matrix is obtained, one could just rotate it using the varimax function.
http://factominer.free.fr/question/FAQ.html
https://groups.google.com/g/factominer-users/c/wdcJndrAISE?pli=1
library(FactoMineR)
famd <- FAMD(df, graph = F)
loadings <- sweep(famd$var$coord,2,sqrt(famd$eig[1:ncol(famd$var$coord),1]),FUN="/")[,1:3]
scores_famd <- scale(famd$ind$coord[,1:3]) %*% varimax(loadings)$rotmat
The computed indexes (first component after rotation) are almost identical.
Could anybody advise if is this indeed a correct way to compute rotated component scores in FAMD?