3

As a non-maths doctoral candidate, I would be absolutely thrilled to bits if anyone here might be willing to take a stab at translating the following PCA results into layman's terms.

There are only two sets of raw data: TOEIC test scores and DMIS stages.

Additionally, I also need to be able to state the significance, if any, of this PCA analysis (e.g. I have to answer the perennial dissertation question "so what?")

Lastly, I must assert that it was definitely not me who performed this PCA, but rather a friend who is conversant with SPSS.

Thank you in advance, even if you're not actually in a position to assist me:

Communalities
    Initial Extraction
TOEIC   1.000   .546
DMIS    1.000   .546
Extraction Method: Principal Component Analysis.


Total Variance Explained
Component       Initial Eigenvalues                    Extraction Sums of Squared Loadings
           Total   % of Variance   Cumulative %   Total   % of Variance   Cumulative %
1          1.092      54.579         54.579       1.092      54.579         54.579
2           .908      45.421        100.000         
Extraction Method: Principal Component Analysis.
ttnphns
  • 51,648
  • 40
  • 253
  • 462

2 Answers2

4

Initial communality of a variable is its variance, the diagonal element of the analysed matrix. In your case it is 1 for both variables because you analysed a correlation matrix (which is equivalent to saying that you analysed standardized variables).

Extraction communality of the variable, also called "extraction sum of squared loadings", is the portion of its variance that is explained, accounted for, by the extracted principal components.

You extracted one principal component of two possible (you have just two variables). The extracted PC1 has variance ("eigenvalue", in PCA's jargon) 1.092 which is ("explains") 1.092/(1+1) = 54.579% of the total (summative) variance of the two variables. The remaining component PC2 - which you chose to leave out (not to "extract") - has variance .908 which would eat up the remainder of the total variance: so, (1.092+.908) = (1+1) = 100%.

See also this for "explained variance", and this for exhaustive "chew over" of PCA. There isn't anything difficult about PCA, for a layman or a monk to get it equally.

PCA itself does not answer questions about "statistical significance", because it is an exploratory and data transformation technique dealing with the data at hand without reference to some "population".

P.S. As @NickCox truly points out in a comment, there is little practical use for PCA when the variables are almost uncorrelated. PCA is meant to be a dimensional-reduction technique for several (usually >2) correlated variables. In your (your friend's) case the two variables are only weakly correlated - which can be seen from the fact that PC1 and PC2 have almost equal variance. (One can easily compute from your output what was that correlation, and believe me it was not far from 0.) So in your example of very weak-correlated variables extracting only one of the two possible components would be a too gross simplification of real data. However, as an educational or theoretical case your example is all right.

ttnphns
  • 51,648
  • 40
  • 253
  • 462
  • 1
    +1. Your two variables are weakly correlated. It's not clear that PCA is of much use compared with plotting the variables, calculating the correlation and thinking about those. – Nick Cox Sep 03 '13 at 08:48
  • @Nick this is why I thought about adding "student" `self-study` tag. – ttnphns Sep 03 '13 at 08:55
1

PCA is a method of data reduction, usually used when you have a great many variables and wish to capture information about them in a relatively small number of variables.

I don't see any reason for doing PCA when you have only 2 variables.

My first question is what sort of data you get from TOEIC scores and DMIS stages. Are these continuous variables? Ordinal ones? Googling, it looks like TOEIC is nearly continuous, but DMIS has only 6 levels. So, correlation is a little problematic (not necessarily completely wrong; it depends on what you are willing to assume about DMIS) and PCA is going to possibly compound those problems.

Given this, my first step would be to create a parallel box plot of the two variables to see how they vary together.

Peter Flom
  • 94,055
  • 35
  • 143
  • 276