0

I want to know the main trend of data by PCA method.

There was a question that explained what the PCA method works. In this page. And there was also a question that told how to implement this method. In this page. So the core code could be coded as follows:

def pca(data, n_components):
    x = np.array(data)
    x -= np.mean(x, axis=0)
    U, s, Vt = la.svd(x, full_matrices=False)
    # called principal axes
    V = Vt.T
    S = np.diag(s)
    k = n_components
    # dimensionality reduction or project the data to the new axes
    US_k = U[:, :k].dot(S[:k, :k])
    return np.array([US_k[:, i] for i in range(k)])

The function above will return the data that dimensionality was reducted, the dimensionality reduction data reflectes the main trend of orgin data.

But, as we all know, scikit-learn provides a PCA method for us to easily use. The core code could be coded as follows:

x = np.array(data)
pca = PCA(n_components=6)
pca.fit(x)
# dimensionality reduction or project the data to the new axes
y = np.dot(data, np.array(pca.components_).T)

When using data to vertify two funtion above, sometimes the two methods above obtained the exact opposite trend. The example as follows:

array = np.array([
    [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20],
    [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20],
    [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20],
    [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20],
    [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20],
    [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20],
    [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20],
    [2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21],
    [2, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20],
]).astype('float32').T


def pca(data, n_components):
    x = np.array(data)
    x -= np.mean(x, axis=0)
    U, s, Vt = la.svd(x, full_matrices=False)
    V = Vt.T
    S = np.diag(s)
    k = n_components
    US_k = U[:, :k].dot(S[:k, :k])
    return np.array([US_k[:, i] for i in range(k)])


data = array
pc = 1
Y = pca(data, n_components=6)
plt.plot(range(len(Y[pc, :])), Y[pc, :])
plt.show()
x = data
_pca = PCA(n_components=6)
_pca.fit(x)
y = np.dot(data, np.array(_pca.components_).T)
plt.plot(range(len(y[:, pc])), y[:, pc])
plt.show()

pca function graph:

enter image description here

scikit-learn PCA graph:

enter image description here

I am not sure why I get the exact opposite trend, I think it has to do with the spindle direction, I am not sure which trend is the right one? How do I correct it?

luneice
  • 1
  • 2

0 Answers0