A categorical PCA (CATPCA) question

Question

I have a dataset that I am analysing using CATPCA in SPSS. The problem that I have is that while I can see the variables that are positively or negatively correlated on the Component Loadings table and plot, I still don't know what opinions I have in my data.

For example, I might find that component 1 comprises of variables A and B which are correlated, but both are negatively correlated with C. My problem is that I need to find out if my data shows that the most people agree with component 1 or agree with the exact opposite. Taking that a step further, it is likely that I have both sides of that story in my data and I need to find the proportions of people agreeing with component 1 and the proportion of people disagreeing with the exact opposite. How do I find this?

I imagine that I need to create a composite variable to do this. I tried multiplying component loadings by the quantifications and I expected that that would give me the object scores on the components, but it didn't. If I could figure out how to get from quantifications to component scores then I could at least go on and build a composite variable, but at the moment I am totally stuck.

This is a real life problem, not coursework etc. I have read a journal paper on this, the SPSS text book and looked at many examples online. All tutorials seem to stop by saying X, Y and Z are correlated and don't take the analysis further to show how often that association is shown within the data. The point is that I don't need to just find the components, I also need to know what my data tells me.

You can save principal scores from the CATPCA procedure as new variables to your dataset and then analyse them as any other variables. Suppose that some CATPCA-analyzed ordinal or dichotomous variable A is higher value for those who "agree" in that question asked. Then, if A is positively loaded by the component then those who have a higher component score are those who agree more. If A is negatively loaded, then it becomes reverse. — ttnphns, Sep 08 '15 at 10:07
Thank you. After this, I can simply use some descriptive stats on those scores or even draw a pretty histogram I suppose. That is useful. — James Walker, Sep 08 '15 at 10:48
Ok, I have a further question, please. If I multiply quantification by the loading, why don't I get the object score? Is this because the quantifications are given before the rotation and the loadings are given after the rotation? — James Walker, Sep 13 '15 at 10:53
What do you mean by "rotation"? Is it PCA itself or rotations of loadings such as varimax? — ttnphns, Sep 13 '15 at 11:56
Thanks, I was hoping that you would answer. In the options under 'Normalisation Method', you get the choice of Variable Principal, Object Principal and a few others. I don't know what the CATPCA is doing, but I assume it is rotating the solution to find the best loadings otherwise it wouldn't have that option. This is the only reason that I can think of to answer why the quantifications * loading =/ object score. — James Walker, Sep 13 '15 at 11:59
CATPCA is standard PCA performed on the quantifications as on the data - after the quantifications had been iteratively found out. So your question is about basics of PCA. Have you read literature on PCA? One shouldn't use CATPCA unless he is acquainted with PCA well enough. — ttnphns, Sep 13 '15 at 12:05
Sure, I completely understand Linear PCA. The loadings are simply the eigenvectors of the covariance matrix. However, I don't understand CATPCA so well and I am getting confused. SPSS isn't giving me the score coefficients.It gives me quantifications and loadings. Hence, I assumed multiplying quantifications by the loadings would give me the component score. How do I get the score coefficients? — James Walker, Sep 13 '15 at 12:12
`The loadings are simply the eigenvectors of the covariance matrix.` No. [Loadings](http://stats.stackexchange.com/q/143905/3277) are the scaled up eigenvectors. — ttnphns, Sep 13 '15 at 12:15
Ok. Let me ask you simply, please. If I have a negative loading and a negative quantification, does that give me a positive component score (I understand that you can't simply multiply the two together) or is that unknown without the score coefficients? The reason I ask is because I would like to know if I have a positive quantification (+A) and a positive loading, and then a negative quantification (-B) and a negative loading, are my results telling me that nominal answer (+A) is correlating with nominal answer (-B)? — James Walker, Sep 13 '15 at 12:25
The same as when you analyze usual variables, not quantifications. If a respondent is negative in variable X and the loading of X is negative by component A the respondent is somewhat expected to receive positive component score in A. But the precise result, the sign, that also depends on his sign in other variables with strong loadings by A and the signs of those other variables loadings. — ttnphns, Sep 13 '15 at 12:35
The sign of loadings is arbitrary (you of course know since you know PCA). You are in right to change sign of any column of loading matrix to the opposite. It will change the sign of the component scores to the opposite. — ttnphns, Sep 13 '15 at 12:38
Thanks. Yes, I understand that I can change the sign I have been putting the object scores into a histogram to show the results that I actually have. I have been working on the assumption that sign of quantification * sign of loading = sign of component (and deciding the component meaning by the strongest loadings of course). Then a positive or negative component score is telling me what opinions I actually have in my data. Reading your responses this seems to be correct hence the work that I have already done isn't wrong. I will go and read your long thread on it now. — James Walker, Sep 13 '15 at 12:48
Thanks Ttnphns. I learnt the maths to CATPCA from here: https://openaccess.leidenuniv.nl/bitstream/handle/1887/12386/Appendices.pdf?sequence=5 and I was just getting confused by loadings and score coefficients. When I divided the loadings by the eigenvalues and multiplied those by the quantifications I was able to almost reproduce my component scores. I did this for many objects and all the associated components. I always calculated an answer that was near, but not exact. What is the reason for that? Is that information lost in the data reduction process or due to error variance in the sample? — James Walker, Sep 14 '15 at 07:31
You should not regard the slight discrepancy too seriously. When I was playing once with CatPCA of SPSS I've found at least two sources of the discrepancy. 1) If you do standard PCA on the quantified variables it will be done on the matrix `X'X/(n-1)` ([see](http://stats.stackexchange.com/a/22520/3277)) while CatPCA does it on `X'X/n`. 2) Results produced by CatPCA are the results output from the last iteration of the quantification process, not the results of PCA done once again after the end of the process. Therefore if the convergence is not exact (as usual) there is a difference. — ttnphns, Sep 14 '15 at 08:13
Ttnphns, thanks again! Your posts have been a great help and I'm finally getting to grips with all of this. — James Walker, Sep 14 '15 at 08:26

A categorical PCA (CATPCA) question

0 Answers0