How does PCA represent all data with just a few principal components?

Question

How does principal component analysis (PCA) model data of admittedly higher dimensionality with just a few principal components?

@MichaelChernick Mostly a fixable grammar and sentence structure problem. — Carl, Jan 24 '17 at 02:35
I think the answer is that PCA finds the more contributory independent variables. — Carl, Jan 24 '17 at 02:44
I'm still not sure exactly what you mean. Are you referring to a situation where you have fewer observations than dimensions (n
>p? Or how it is that the first few principle components can represent a higher dimensional dataset 'well enough'? Or something else? — gung - Reinstate Monica, Jan 24 '17 at 02:46
@gung how that the first few principle component can represent a higher dimensional dataset well enough? that more likely my question. — bbadyalina, Jan 24 '17 at 03:21
I am not certain that I understand your question. You have a dataset with many observations but when you use PCA you end up with less components then that there are observations? If that is the case, please see this question and answer: http://stats.stackexchange.com/questions/99351/if-number-of-samples-is-smaller-than-number-of-features-how-can-all-the-varianc?rq=1 — Frank Drost, Jan 24 '17 at 02:14

score 1 · Accepted Answer · answered Jan 24 '17 at 02:52

I believe your question is something like:

"I have 10000 features and thus very high dimension, why PCA with only 3 principal components work"?

There is a misunderstanding here. We don't represent the original data set with just a few PC, we approximate and thus PCA is a data reduction technique. You will almost likely lose some information, but if you can minimize the information you lose, you should be fine.

PCA works by forming a new set of variables from the original features. It does that by maximising the variance the new variables can account for. You can think of it like an approximation technique. You approximate what you have, but the new approximation is not perfect. In practice, you can decide how many principal components you want. The more you want, the better approximation you have.

score 0 · Answer 2 · answered Jan 24 '17 at 02:49

I will try to answer this question using an example. Suppose you have to points which are (1,1) and (2,2). These points lie on a straight line $y=x$. Another representation of these two points can be $\sqrt(2)$ and $\sqrt(8)$, which are distances of these two points from origin, or we can say projections of these two points on the vector $\frac{1}{\sqrt(2)}(1,1)$. Which means that these two points represent a vector. Actually, any two points represent just a vector. To make the idea of PCA more intuitive, suppose instead of two points we have more points such as $(1,1); (2,2); (3,3),....,(10,10)$. All these ten points represent the same unit vector.

Similarly, suppose if we have points such as $(1,1,2), (2,2,2),.....,(10,10,2)$. Our data is three dimensional now; however, the number in third dimension is always equal to 2. Again, the data varies in the direction of unit vector,$\frac{1}{\sqrt(2)}(1,1)$, mentioned above.

So to answer your question, high dimensional data can not be represented by a lower dimensional space. Only the variation in high dimensional data can be represented by lower dimensional data, as in the example given.

The link provided by Frank Drost above is also useful. Also, this is a helpful link to understand PCA: http://mengnote.blogspot.com/2013/05/an-intuitive-explanation-of-pca.html

How does PCA represent all data with just a few principal components?

2 Answers2