Meaning of principal components

Question

I have difficulty understanding the meaning of the Principal Components (PC) -

On one hand, PC are computed by finding loading vectors that maximize the variance, but on the other hand I read another interpretation that says PC are the closes to the n-observations. It seems to me like there is a contradiction, how are they closest if they have the highest variance?..
In general, when do we know to stop at the first PC and not go for the second one?

Thanks

I don't understand why this question was closed as a duplicate of that other one. This Q [confusingly] asks two unrelated questions, and even though both of them have been many times asked before and are arguably duplicates, *none* of them is a duplicate of the currently marked question. That one is very specifically about computing how much variance is explained by a given projection of the data; this is not what OP asked here. — amoeba, May 05 '15 at 16:18
I downvote partially because I find that gluing two unrelated questions together is very confusing. Please consider removing one of your questions (and perhaps asking it separately); then I will be happy to revert my downvote. — amoeba, May 05 '15 at 16:19
@amoeba, I agree that this is not a duplicate of the linked thread. I had voted to leave open. It might be helpful if you could link to the 2 threads that best answer these Qs (ie, the true duplicates). — gung - Reinstate Monica, May 05 '15 at 16:46
You might find that the various answers at http://stats.stackexchange.com/questions/2691 resolve the first question. — whuber, May 06 '15 at 00:12

score 2 · Answer 1 · answered May 04 '15 at 14:01

Denote the $i$-th observation as ${\bf x}_i = (x_{i1}, x_{i2}, \ldots, x_{ip})$, and let also $\tilde{\bf x}_i^{(k)} = \sum_{j=1}^k \lambda_j^{1/2} s_{ij} {\bf u}_j$ be the projection of ${\bf x}_i$ onto the linear subspace of the first $k$ principal components (i.e., the vector of predictions of ${\bf x}_i$ based on the scores $s_{i1}, s_{i2}, \ldots, s_{ik}$ of the first $k$ components). Then $\tilde{\bf x}_i^{(k)}$ is closest to ${\bf x}_i$ in the sense that $\sum_{i=1}^n \| {\bf x}_i({\bf a},{\bf B}) - {\bf x}_i\|^2$ is minimized, among all possible $k$-variate vectors that have the form $x_i({\bf a},{\bf B}) = \sum_{j=1}^k a_i {\bf b}_j$, when the principal components are used. In other words, if you want to approximate your data with $k$ orthogonal vectors of dimension $p$ and a $n\times k$ matrix of scores, then the principal components give you the best approximation. They maximize the explained variance, and hence they minimize the residual variance.

score 1 · Answer 2 · answered May 04 '15 at 13:26

I'm sure there's a ton of material on these exact questions in interweb.

One way to look at PCA is as follows. Let's say you have a set of observations: $X(i)$. Now each $X(i)$ is a vector itself, i.e. it consists of $n$ variables $X_1(i),X_2(i),\dots,X_n(i)$.

Sometimes these $n$ components are highly correlated with each other. For instance, imagine measuring the weight, height, chest size, skulls size etc. of a population of in a town. So, maybe you instead of dealing with $n$ different size measures you want to have just one. You could run PCA on a matrix $X_j(i)$, and obtain the first PCA component. PCA will return you a score matrix of the same dimensions as $X$, so you get the first column $s_1(i)$. This will most likely be your size measure, because it will capture co-movement of all size measures in one number. The coefficient matrix will be n-by-n, you get the first column (or a row depending on the software), which will return you the vector of ceofficients, which are weights of each size measure such as weight and height to obtain the first principal component.

Meaning of principal components

2 Answers2