0

I have a data set and I performed the following operation:

from sklearn.decomposition import PCA
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X_train)

pca_test = PCA(n_components=10)
pca_test.fit(X_scaled) #------------Using scaled data
cvr = np.cumsum(pca_test.explained_variance_ratio_)

pca_test = PCA(n_components=10)
pca_test.fit(X_train) #------------Using non scaled data
cvr2 = np.cumsum(pca_test.explained_variance_ratio_)

print(np.round(cvr, 2))
[0.31 0.52 0.69 0.81 0.9  0.96 0.98 0.99 1.   1.  ]

print(np.round(cvr2, 2))
[0.97 0.99 0.99 1.   1.   1.   1.   1.   1.   1.  ]

What does the large explained variance in the two method tell me about data or what I am doing with data rather?

Essentially when I scale the data there is no variance reduction. So, does it mean that in my case it is not the best idea to apply standardization ?

user1243255
  • 411
  • 4
  • 14

0 Answers0