0

OK, this question may already be answered but I do not quite understand all of it.

So I have understood that scores loadings and principal components are different concepts:

Scores: Are the new coordinates of the observations in the new space defined by PC. (There is no such thing as first principal component score)

Loadings: They are the coefficients of the linear combination of the original variables in orher to form the principal components.

Then in a related question they mention that a principal component is a linear combination of what ?

Is a linear combination of loadings and the original observed variables ?

Then, a score is what ? the component of the PC vector ?

  • 1
    One correction...*loadings* express the linear correlation between an input feature and a principal component. As such, they are not *coefficients* in the regression sense. – Mike Hunter Dec 18 '20 at 17:10
  • OK, so the *loadings* are linear correlation between input feature and a principal component. But I am sure there is a way to express this specification in a mathematical way. Just to clarify the distniction between Principal Components, Scores and Loadings. – Edmond Geraud Aguilar Dec 18 '20 at 19:41
  • Yes, those *scores* are the coefficients as a function of the components. – Mike Hunter Dec 19 '20 at 00:21
  • 1
    The loadings "express" correlation but they aren't correlations in general. To see this, look at output from a PCA and then correlate PC scores with the original variables. The scores are values of the components and thus not the coefficients used to calculate those scores. Several excellent threads explain in detail. https://stats.stackexchange.com/questions/2691/making-sense-of-principal-component-analysis-eigenvectors-eigenvalues is another. – Nick Cox Dec 19 '20 at 03:08
  • 1
    In practice it can be easier to learn terms by getting output from your favourite software and studying it in relation to the software documentation. If the documentation doesn't explain well enough, you need a new favourite or a textbook. I've seen some slack explanations of loadings in different places, but don't recall any disagreement over what scores are or over what the correlations between the PCs and the variables are. – Nick Cox Dec 19 '20 at 13:27
  • 1
    Watch out that some discussions presume PCA from a correlation matrix whereas using a covariance matrix is possible and even sometimes advisable if all variables are measured in the same units. – Nick Cox Dec 19 '20 at 13:27
  • wrt interpreting factor loadings as correlations. In fact, the literature seems fairly evenly split on the subject. To take just two examples, Harman's book, *Modern Factor Analysis* seems in agreement that eigenvectors (loadings) express the magnitude of the contribution of a variable to a component, without using the word *correlation*. SAS documentation, however, is quite explicit in describing loadings as correlations with a component or factor. Slack? Nope! – Mike Hunter Dec 19 '20 at 17:10
  • 2
    No disagreement on eigenvectors from me. You've amplified my main point about loadings: different sources show or imply different meanings. Now make sense of "those scores are the coefficients as a function of the components", which is utterly confused. – Nick Cox Dec 19 '20 at 17:33
  • 1
    "Any reasonable person": this is high school debating stuff best left in high school, as those persons somehow always turn out to be those who agree with the speaker. The only issue I would raise about literature beyond this thread is some inconsistency over what loadings are, inconsistency that is, or should be, resolved by looking at the equations people give or what the code used calculates. Correlations and scores and eigenvectors and eigenvalues are what they are, unless chapter and verse can be cited for disagreements. These are all technicalities to be discussed technically. . – Nick Cox Dec 21 '20 at 16:21

0 Answers0