1

I understand what scores are in PCA, in particular this answer gives a good mathematical formulation:

(Scores) are projections of the centred data in the linear space defined by the eigenvectors.

But the concept of scores, as representations of the original data in a new space does not have to be PCA-specific - other linear spaces can be identified with other techniques. Does the concept of scores have a general mathematical definition outside of PCA? Specifically, I wonder if there is one that would accommodate nonlinear techniques as well - the way I see it, techniques such as non-linear PCA also produce something that is analogous to scores, just that they can have higher dimensionality than the original data.

So, is there one mathematical definition to rule them all?

EdM
  • 57,766
  • 7
  • 66
  • 187
kamilazdybal
  • 672
  • 8
  • 20

3 Answers3

1

The Wikipedia page for disambiguating use of the word "score" doesn't even seem to include that use, but it does include several other uses of the word in mathematics or statistics. The one I suspect that you would find most frequently used on this site is "the gradient of the log-likelihood function with respect to the parameter vector," as used in the score test for models fit by maximum likelihood.

EdM
  • 57,766
  • 7
  • 66
  • 187
1

PCA is nothing but a new system of orthogonal coordinates, which happens to be a linear transformation of an old system of coordinates. You take the old coordinates $x_1,x_2,\dots,x_n$ in an original Euclidian system $(e_1,e_2,\dots,e_n)$ and linearly transform it into a new system of coordinates (factors) $(f_1,f_2,\dots,f_n)=Ae$. The scores are coordinates in new system $(s_1,s_2,\dots,s_n)$.

Aksakal
  • 55,939
  • 5
  • 90
  • 176
1

I don't know about a score definition that "rules them all," but consider the notion shown in here https://www.ncbi.nlm.nih.gov/pubmed/28715259. There, the first PC score is defined as a linear combination (score) that maximizes the sum of $R^2$ values when regressing each of the original variables on the score. As such, the awkward and not-easily understood "variance maximization" and "unit length constraint" concepts can be discarded completely. The first two scores are any two linear combinations that similarly maximize the sum of $R^2$s in the multiple regression model using the two scores as predictors; thus, not only are unit length and variance maximization concepts unnecessary, but so is orthogonality.

This approach could be modified to the nonlinear case, see https://users.soe.ucsc.edu/~draper/eBay-Google-2013-breiman-friedman-1985.pdf for nonlinear transformations (scores) that maximize the $R^2$ in a regression with a single dependent variable. I would imagine that such an approach could also be used to maximize the sum of $R^2$ values as in linear principal components analysis, and that this would indeed be very useful dimension reduction technique, with fewer scores explaining a greater (average) percentage of variation in the original variables, but I am not aware that anyone has done it.

BigBendRegion
  • 4,593
  • 12
  • 22