1

I'm looking for advice to whether or not the following method is good and is standard for calculating PCA of the data. So the examples that I will give will be small.

Given a matrix of $A = [4, 6, 10; 3, 10, 13; -2, -6, -8]$ and Compute the Covariance matrix given as:

$$A = \begin{bmatrix} 10.3& 24.6 & 35.0\\ 24.6& 69.3 & 94.0\\ 35.0& 94.0 &129.0 \end{bmatrix}$$

I then compute the Eigenvalues and Eigen vectors of this matrix. In order to calculate the PCA, I then do the following:

1) Take the square root of the eigen values -> Giving the singular values of the eigenvalues

2) I then standardises the input matrix $A$ with the following: $A - mean(A) / sd(A)$

3) Finally, to calculate the scores, I simply multiply "A" (after computing the standardization with with Eigenvectors

This would then give me the PCA scores? Is this correct or am I missing something?

Phorce
  • 119
  • 1
  • 5

1 Answers1

2

let us suppose that you have matrix $X$ with dimension $m$ and $n$ ,where $m$ is number of observation and $n$ is number of variable,then for covariance matrix we need

     means=mean(X);
     centered=X-repmat(means,m,1);
    covariance=(centered'*centered)/(m-1);

that is covariance matrix

now let us do eigenvalue decomposition choose some components for example first $k$ components;

[V,D]=eig(covariance);
  [e,i]=sort(diag(D),'descend');
   sorted=V(:,i);
    reduced=sorted(:,1:k);
   PCA=X*reduced;

EXAMPLE :

A = [4, 6, 10; 3, 10, 13; -2, -6, -8]

A =

     4     6    10
     3    10    13
    -2    -6    -8

[m,n]=size(A)

m =

     3


n =

     3
>> means=mean(A);
>> centered=A-repmat(means,3,1);
>> covariance=(centered'*centered)/(3-1);
>> covariance

covariance =

   10.3333   24.6667   35.0000
   24.6667   69.3333   94.0000
   35.0000   94.0000  129.0000

now let us do eigenvalue decomposition

>> [V,D]=eig(covariance);
>> [e,i]=sort(diag(D),'descend');
>> sorted=V(:,i)

sorted =

    0.2126    0.7883    0.5774
    0.5764   -0.5783    0.5774
    0.7890    0.2100   -0.5774

>> e

e =

  207.1022
    1.5644
   -0.0000

percentage distribution of variances

>> (e./sum(e))*100

ans =

   99.2503
    0.7497
   -0.0000

let us choose first component

    >> S=sorted(:,1);
>> PCA=A*S

PCA =

   12.1991
   16.6592
  -10.1958
dato datuashvili
  • 723
  • 2
  • 7
  • 21