PCA principal components in sklearn not matching eigen-vectors of covariance calculated by numpy

Question

I was trying to replicate PCA in sklearn's PCA API using numpy using PCA in numpy and sklearn produces different results. I noticed that:

eigenvalues are same as the PCA object's explained_variance_ atribute along with the order
eigenvectors are not same. Here is my code:

import numpy as np
from sklearn.decomposition import PCA
from sklearn import datasets
from sklearn.preprocessing import StandardScaler
X = datasets.load_iris()['data']
X_scaled = StandardScaler().fit_transform(X)

pca = PCA(n_components=4)
pca.fit(X_scaled)

print('Explained Variance = ', pca.explained_variance_)
print('Principal Components = ', pca.components_)

This gives me:

Explained Variance =  [2.93808505 0.9201649  0.14774182 0.02085386]
Principal Components =  [[ 0.52106591 -0.26934744  0.5804131   0.56485654]
 [ 0.37741762  0.92329566  0.02449161  0.06694199]
 [-0.71956635  0.24438178  0.14212637  0.63427274]
 [-0.26128628  0.12350962  0.80144925 -0.52359713]]

Using Numpy:

cov = np.cov(X_scaled.T)
eig_val, eig_vec = np.linalg.eig(cov)
print('Eigenvalues = ', eig_val)
print('Eigenvectors = ', eig_vec)

This gives me:

Eigenvalues =  [2.93808505 0.9201649  0.14774182 0.02085386]
Eigenvectors =  [[ 0.52106591 -0.37741762 -0.71956635  0.26128628]
 [-0.26934744 -0.92329566  0.24438178 -0.12350962]
 [ 0.5804131  -0.02449161  0.14212637 -0.80144925]
 [ 0.56485654 -0.06694199  0.63427274  0.52359713]]

Notice that eigenvalues are exactly the same as pca.explained_variance_ ie unlike the post PCA in numpy and sklearn produces different results suggests, we do get the eigenvalues by decreasing order in numpy (at least in this example) but eigenvectors are not same as pca.components_. Why is this and how do I replicate the exact result of Sklearn's PCA API manually.

score 7 · Accepted Answer · answered Jan 31 '19 at 14:59

7

While this is a pure python related question which is not fitted here for CrossValidated, let me help you anyway. Both procedures find the correct eigenvectors. The difference is in its representation. While PCA() lists the entries of an eigenvectors rowwise, np.linalg.eig() lists the entries of the eigenvectors columnwise. Remember that eigenvectors are only unique up to a sign. Indeed, a simple check yields:

print(abs(eig_vec.T.round(10))==abs(pca.components_.round(10)))
[[ True,  True,  True,  True],
   [ True,  True,  True,  True],
   [ True,  True,  True,  True],
   [ True,  True,  True,  True]])

answered Jan 31 '19 at 14:59

BloXX

528
4
9

Thanks. I forgot that negative of eigenvectors is the same eigenvector. And I didnt know the thing about the two APIs giving row-wise vs column-wise results. This is very helpful. – Piyush Singh Jan 31 '19 at 17:39
Thanks for noting that negative of eigenvector is the same vector. Helped me in an assignment – Siddhesh Rane Dec 01 '21 at 19:58

PCA principal components in sklearn not matching eigen-vectors of covariance calculated by numpy

1 Answers1