Related to this question: Maximum number of principal components in PCA. Is sklearn wrong?
If n_samples < n_features
, PCA should only returns n_samples - 1
directions. However, sklearn
always returns n_components
given by min(n_samples, n_features)
. In the comments to the question above it was claimed that the last component should be trivially zero. However, per sklearn's implementation, this is only true for the training dataset. For any new samples, the last component will not be zero:
from sklearn.decomposition import PCA
import numpy as np
pca = PCA()
train = np.random.rand(5,10)
pca.fit(train)
print pca.transform(train)[:,-1] ##output = [ 6.70056333e-17 -9.24789628e-18 1.18730019e-15 -3.71242110e-16 -4.70382051e-16]
print pca.transform(np.random.rand(5,10))[:,-1] ## output = [-0.12061904 -0.33477243 -0.29965447 0.65033472 -0.05476772]
My question is now: what is the last direction generated from sklearn's PCA, when n_samples < n_features
?