How is the principal component applied on to the data?

Question

I am bit unsure (Or might be overthinking this) or is the chosen PC somehow applied on to the data, to reduce the dimentionality of the data, or how does one use PC to do any form of mathematical computation?...

I mean each principal component is a linear combinations of the variables.. It basically is an eigenvector and eigenvalue stating the pattern of the data.

Most textbooks seem to end the elaboration after having found the eigenvectors and the eigenvalues, but aren't there another step which implies reducing the data with the PC found?...

`Most textbooks seem to end the elaboration...` Really? Do they ever mention the computation the values of the first `m` PCs, thereby reducing the dimensionality? — ttnphns, Jun 19 '16 at 13:17
I get how they compute PC's and so on... but aren't the components somehow applied to the original dataset.. each PC is just a linear combinations.. I know that they reduce the dimentionality, but not sure how a PC can reduce the dimentionality of the dataset, as they aren't applied on to the original dataset , only extracted from it. — Sorrow, Jun 19 '16 at 13:19
Principal components can be used to restore the original variables data. — ttnphns, Jun 19 '16 at 13:21
If i used all I would have the complete dataset, thats right, what if you only wanted to have 50% variance. Well one way would be to choose m numbers of PC's such that the requirement is fulfilled, and then what? — Sorrow, Jun 19 '16 at 13:24
Principal components don't reduce dimensionality: if dataset has $n$ features, we can compute $n$ PCs, so dimensionality doesn't change. Dimensionality reduction takes place when we use an arbitrary number of PCs instead of original features in dataset. — Ogurtsov, Jun 19 '16 at 15:55

score 2 · Accepted Answer · edited Apr 13 '17 at 12:44

2

Given you have found the principal modes of variation in your sample (eigenvectors) you use these as your new axis system. When you compute the PC scores you simple project the data on the axial system defined by these eigenvectors.

Let's say your original data is an $N \times p$ matrix $X_0$ with a valid covariance $p \times p$ matrix $C$ (the columns of $X_0$ are assumed to have mean $0$). The covariance $C$ can then be eigen-decomposed as $C = U \Lambda U^T$ where $U$ is the $p \times p$ matrix of eigenvectors (each column is an eigenvector) and $\Lambda$ is the diagonal matrix holding the corresponding eigenvalues. This is the most important step, now $U$ can allow a change of basis from $X_0$ to $Z$, where we define $Z = X_0 U$, $Z$ are now the projected scores $N \times p$ matrix; these are the projections of the original data $X_0$ onto the axis defined by the columns of $U$. Note that here we used the full matrix $U$. If we used the optimal $k$-th dimensional approximation of the data $X_0$ we would use only the first $k$ columns of $U$. As you see the data $X$ are directly employed for their dimensional reduction using the eigenvectors $U$. (Sometimes the columns of $U$ are also called loadings.)

Please also see the thread here; it contains some great answers to assist your intuition further.

Note that I used the work-flow of calculating PCA using the covariance matrix. Most implementations use the singular value decomposition of the original data directly by default because of its better numerical properties in some cases. The results from the two routines are perfectly equivalent. I used the covariance-based approach because I think that it a bit more intuitive. The thread here, contains an excellent answer on how SVD relates to PCA.

edited Apr 13 '17 at 12:44

Community

1

answered Jun 20 '16 at 19:39

usεr11852

33,608
2
75
117

Isn't there something wrong.. should it not be $Z = U X_0^T$ – Sorrow Jun 22 '16 at 19:08
Could you elaborate on the term projected scores.. ?? – Sorrow Jun 22 '16 at 19:24
Err... my bad, sorry typo; I hope it is cleaner now. – usεr11852 Jun 22 '16 at 19:32
Sure no probs, done. – usεr11852 Jun 22 '16 at 19:34
So projection score is basically the loadings? – Sorrow Jun 22 '16 at 19:36
$X_0^T$ again.. – Sorrow Jun 22 '16 at 19:39
No, the $U$ is the loadings. (See for example the terminology [here](http://uk.mathworks.com/help/stats/pca.html); in general avoid using the term *loadings*, some people use it interchangeably, I used in case you saw it somewhere; saying *eigenvectors* and *projected scores* leaves nothing open to interpretation. ) – usεr11852 Jun 22 '16 at 19:42
Why $X_0^T$? $X_0$ is $N \times p$, $U$ is $p \times p \text{(or } k )$. It has to be $X_0 U$ so the projected scores have rows corresponding to observations, and columns to components used. – usεr11852 Jun 22 '16 at 19:44
Sorry my bad about that, i thought that transposed was needed. – Sorrow Jun 22 '16 at 19:47
I think i understand how the method is a data reduction method rather, than dimension reduction/feature reduction. what i don't get how it is the first feature vector (first row in $X_0$), which linear combination is the best? I understand that U contains the eigenvector corresponding to the highest variance, but why is the first row it should be linearly combined to? what does those datapoint has to do with it? why exactly those? – Sorrow Jun 22 '16 at 19:47
Cool, I am glad I could help. If you believe this answers your question you could consider accepting the answer. – usεr11852 Jun 22 '16 at 19:49
I think you misinterpreter the nature of $X_0$. The first row of $X_0$ are the observations for the first till the $p$-th feature of the first observation. In addition when you are making the projection using the first column of $U$, you are using all the columns/features of $X_0$ so it has nothing to do with only the first column (or row). – usεr11852 Jun 22 '16 at 19:56
why ist linear combination with the first observation used to compute the PC1. $X_0$ [1xp] $U$ [px1] $=$ [1x1] => $projection score$.. Isn't this value related to the principal component? – Sorrow Jun 22 '16 at 20:00
your projection score is actually the principal component itself..?? or what? – Sorrow Jun 22 '16 at 20:07
The scalar product of $X_0 [1,:]$ (the first observation) with the first eigenvectors $U [:,1]$ is the projected score of the first observation onto the axis defined by the first eigenvector. Please see [here](https://en.wikipedia.org/wiki/Dot_product#Geometric_definition) the geometric definition/interpretation of dot product for more details. – usεr11852 Jun 22 '16 at 21:11
Some people say components and mean the projected scores, some other people say (like myself) say component and mean the eigenvector (and/or the direction of the axis). Unfortunately in many cases you have to see the author's description to see what it is meant. That's why I said, stick to eigenvectors and projected scores. You cannot misinterpreter those. – usεr11852 Jun 22 '16 at 21:16

How is the principal component applied on to the data?

1 Answers1