Principal axes, what are they and how to decide them?

Question

I am reading a book about data mining and am currently in a chapter about Principal Component Analysis. But I am not sure from the explanation in the book what the principal axes are and how to find them.

If I have the feature vector $X = \{(-3,-1,-1),(0,-1,0),(-2,-1,2),(1,-1,3)\}$, how do I find the principal axes?

Thanks!

http://stats.stackexchange.com/questions/2691/making-sense-of-principal-component-analysis-eigenvectors-eigenvalues — Nikolas Rieble, Jan 21 '17 at 12:51

score 4 · Accepted Answer · answered Jan 21 '17 at 12:26

With assume that you know the exactly reason for using PCA ,First of all assume that we have a our X matrix contains our data ( we have d features):

$$ X = \begin{array}{cc} x_1^1 & x_2^1 & ... & x_d^1 \\ x_2^1 & x_2^2 & ... & x_d^2 \\ ...\\ x_n^1 & x_n^2 & ... & x_d^n\\ \end{array} $$

We should compute a vector of all feature Mean :

$$ E[X] = \mu = [\mu_1,...,\mu_d] $$ and compute matrix of all feature Covariance : $$ Cov(x) = E[(x-\mu)(x-\mu)^T] $$ now we have Covariance matrix we should compute eigenvectors and eigenvalues of Covariance Matrix , for this d*d matrix we have d eigenvalues , sort them from highest to lowest , now here is the PCA trick , we can ignore some of eigenvalues that are below a threshold because the eigenvectors belongs to them represents the feature that have lowest variance between data , and that feature is not very handy for classify objects :

for example here PC1 is more useful for classify objects than PC2 with respect to data distribution towards that axis.now you have 4 data with 3 feature you can apply PCA to it to clarify which feature is not very helping you for classifying objects , so you can ignore it , in PCA you lose some of accuracy to obtain performance and simplifying computation.

Principal axes, what are they and how to decide them?

1 Answers1