I want to know the main trend of data by PCA method.
There was a question that explained what the PCA method works. In this page. And there was also a question that told how to implement this method. In this page. So the core code could be coded as follows:
def pca(data, n_components):
x = np.array(data)
x -= np.mean(x, axis=0)
U, s, Vt = la.svd(x, full_matrices=False)
# called principal axes
V = Vt.T
S = np.diag(s)
k = n_components
# dimensionality reduction or project the data to the new axes
US_k = U[:, :k].dot(S[:k, :k])
return np.array([US_k[:, i] for i in range(k)])
The function above will return the data that dimensionality was reducted, the dimensionality reduction data reflectes the main trend of orgin data.
But, as we all know, scikit-learn
provides a PCA
method for us to easily use. The core code could be coded as follows:
x = np.array(data)
pca = PCA(n_components=6)
pca.fit(x)
# dimensionality reduction or project the data to the new axes
y = np.dot(data, np.array(pca.components_).T)
When using data to vertify two funtion above, sometimes the two methods above obtained the exact opposite trend. The example as follows:
array = np.array([
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20],
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20],
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20],
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20],
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20],
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20],
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20],
[2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21],
[2, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20],
]).astype('float32').T
def pca(data, n_components):
x = np.array(data)
x -= np.mean(x, axis=0)
U, s, Vt = la.svd(x, full_matrices=False)
V = Vt.T
S = np.diag(s)
k = n_components
US_k = U[:, :k].dot(S[:k, :k])
return np.array([US_k[:, i] for i in range(k)])
data = array
pc = 1
Y = pca(data, n_components=6)
plt.plot(range(len(Y[pc, :])), Y[pc, :])
plt.show()
x = data
_pca = PCA(n_components=6)
_pca.fit(x)
y = np.dot(data, np.array(_pca.components_).T)
plt.plot(range(len(y[:, pc])), y[:, pc])
plt.show()
pca
function graph:
scikit-learn PCA
graph:
I am not sure why I get the exact opposite trend, I think it has to do with the spindle direction, I am not sure which trend is the right one? How do I correct it?