I was trying to replicate PCA in sklearn's PCA API using numpy using PCA in numpy and sklearn produces different results. I noticed that:
- eigenvalues are same as the PCA object's explained_variance_ atribute along with the order
- eigenvectors are not same. Here is my code:
import numpy as np
from sklearn.decomposition import PCA
from sklearn import datasets
from sklearn.preprocessing import StandardScaler
X = datasets.load_iris()['data']
X_scaled = StandardScaler().fit_transform(X)
pca = PCA(n_components=4)
pca.fit(X_scaled)
print('Explained Variance = ', pca.explained_variance_)
print('Principal Components = ', pca.components_)
This gives me:
Explained Variance = [2.93808505 0.9201649 0.14774182 0.02085386]
Principal Components = [[ 0.52106591 -0.26934744 0.5804131 0.56485654]
[ 0.37741762 0.92329566 0.02449161 0.06694199]
[-0.71956635 0.24438178 0.14212637 0.63427274]
[-0.26128628 0.12350962 0.80144925 -0.52359713]]
Using Numpy:
cov = np.cov(X_scaled.T)
eig_val, eig_vec = np.linalg.eig(cov)
print('Eigenvalues = ', eig_val)
print('Eigenvectors = ', eig_vec)
This gives me:
Eigenvalues = [2.93808505 0.9201649 0.14774182 0.02085386]
Eigenvectors = [[ 0.52106591 -0.37741762 -0.71956635 0.26128628]
[-0.26934744 -0.92329566 0.24438178 -0.12350962]
[ 0.5804131 -0.02449161 0.14212637 -0.80144925]
[ 0.56485654 -0.06694199 0.63427274 0.52359713]]
Notice that eigenvalues are exactly the same as pca.explained_variance_
ie unlike the post PCA in numpy and sklearn produces different results suggests, we do get the eigenvalues by decreasing order in numpy (at least in this example) but eigenvectors are not same as pca.components_
. Why is this and how do I replicate the exact result of Sklearn's PCA API manually.