How does principal component analysis (PCA) model data of admittedly higher dimensionality with just a few principal components?
2 Answers
I believe your question is something like:
"I have 10000 features and thus very high dimension, why PCA with only 3 principal components work"?
There is a misunderstanding here. We don't represent the original data set with just a few PC, we approximate and thus PCA is a data reduction technique. You will almost likely lose some information, but if you can minimize the information you lose, you should be fine.
PCA works by forming a new set of variables from the original features. It does that by maximising the variance the new variables can account for. You can think of it like an approximation technique. You approximate what you have, but the new approximation is not perfect. In practice, you can decide how many principal components you want. The more you want, the better approximation you have.

- 6,764
- 4
- 27
- 48
I will try to answer this question using an example. Suppose you have to points which are (1,1) and (2,2). These points lie on a straight line $y=x$. Another representation of these two points can be $\sqrt(2)$ and $\sqrt(8)$, which are distances of these two points from origin, or we can say projections of these two points on the vector $\frac{1}{\sqrt(2)}(1,1)$. Which means that these two points represent a vector. Actually, any two points represent just a vector. To make the idea of PCA more intuitive, suppose instead of two points we have more points such as $(1,1); (2,2); (3,3),....,(10,10)$. All these ten points represent the same unit vector.
Similarly, suppose if we have points such as $(1,1,2), (2,2,2),.....,(10,10,2)$. Our data is three dimensional now; however, the number in third dimension is always equal to 2. Again, the data varies in the direction of unit vector,$\frac{1}{\sqrt(2)}(1,1)$, mentioned above.
So to answer your question, high dimensional data can not be represented by a lower dimensional space. Only the variation in high dimensional data can be represented by lower dimensional data, as in the example given.
The link provided by Frank Drost above is also useful. Also, this is a helpful link to understand PCA: http://mengnote.blogspot.com/2013/05/an-intuitive-explanation-of-pca.html

- 1,511
- 8
- 23
>p? Or how it is that the first few principle components can represent a higher dimensional dataset 'well enough'? Or something else?
– gung - Reinstate Monica Jan 24 '17 at 02:46