General mathematical definition of a score

Question

I understand what scores are in PCA, in particular this answer gives a good mathematical formulation:

(Scores) are projections of the centred data in the linear space defined by the eigenvectors.

But the concept of scores, as representations of the original data in a new space does not have to be PCA-specific - other linear spaces can be identified with other techniques. Does the concept of scores have a general mathematical definition outside of PCA? Specifically, I wonder if there is one that would accommodate nonlinear techniques as well - the way I see it, techniques such as non-linear PCA also produce something that is analogous to scores, just that they can have higher dimensionality than the original data.

So, is there one mathematical definition to rule them all?

score 1 · Answer 1 · answered Feb 13 '20 at 21:45

1

The Wikipedia page for disambiguating use of the word "score" doesn't even seem to include that use, but it does include several other uses of the word in mathematics or statistics. The one I suspect that you would find most frequently used on this site is "the gradient of the log-likelihood function with respect to the parameter vector," as used in the score test for models fit by maximum likelihood.

answered Feb 13 '20 at 21:45

EdM

57,766
7
66
187

These aren't the scores I'm looking for :) – kamilazdybal Feb 13 '20 at 21:59

Aksakal · Answer 2 · 2020-02-14T15:33:57.663

1

PCA is nothing but a new system of orthogonal coordinates, which happens to be a linear transformation of an old system of coordinates. You take the old coordinates $x_1,x_2,\dots,x_n$ in an original Euclidian system $(e_1,e_2,\dots,e_n)$ and linearly transform it into a new system of coordinates (factors) $(f_1,f_2,\dots,f_n)=Ae$. The scores are coordinates in new system $(s_1,s_2,\dots,s_n)$.

edited Feb 14 '20 at 15:33

answered Feb 13 '20 at 22:43

Aksakal

55,939
5
90
176

score 1 · Accepted Answer · answered Feb 14 '20 at 15:07

I don't know about a score definition that "rules them all," but consider the notion shown in here https://www.ncbi.nlm.nih.gov/pubmed/28715259. There, the first PC score is defined as a linear combination (score) that maximizes the sum of $R^2$ values when regressing each of the original variables on the score. As such, the awkward and not-easily understood "variance maximization" and "unit length constraint" concepts can be discarded completely. The first two scores are any two linear combinations that similarly maximize the sum of $R^2$s in the multiple regression model using the two scores as predictors; thus, not only are unit length and variance maximization concepts unnecessary, but so is orthogonality.

This approach could be modified to the nonlinear case, see https://users.soe.ucsc.edu/~draper/eBay-Google-2013-breiman-friedman-1985.pdf for nonlinear transformations (scores) that maximize the $R^2$ in a regression with a single dependent variable. I would imagine that such an approach could also be used to maximize the sum of $R^2$ values as in linear principal components analysis, and that this would indeed be very useful dimension reduction technique, with fewer scores explaining a greater (average) percentage of variation in the original variables, but I am not aware that anyone has done it.

General mathematical definition of a score

3 Answers3