0

I have a test csv file and I have written a code via Scikit to show the PCA for that. I also use another tool in Excel (XLSTAT) to compare the results. The XLSTAT automatically calculates the number of features, however, based on my understanding, I have to specify how many components are needed using the scikit package. For example, while XLSTAT shows 5 features:

Factor scores:
F1 F2 F3 F4 F5
A1 -1.293 -0.663 -0.462 -0.713 0.010
A2 -0.297 0.293 -1.429 0.397 0.056
A3 2.328 0.069 0.987 -0.108 0.062
A4 -0.556 -2.273 0.538 0.344 -0.032
A5 1.823 0.775 -0.597 -0.052 -0.085
A6 -2.005 1.799 0.963 0.133 -0.011

In the following code, I specified 2 components:

x = StandardScaler().fit_transform(x)
pca = PCA(n_components=2)
principalComponents = pca.fit_transform(x)
print( principalComponents )

[[-1.29292842 0.66325508] [-0.29706395 -0.29346337] [ 2.32751305 -0.06850045] [-0.5558091 2.27288988] [ 1.82312052 -0.77527304] [-2.0048321 -1.7989081 ]]

As you can see, the first column in XLSTAT and scikit are the same. However, the second columns are negated. For example, considering F1 and F2, we see

XLSTAT => -1.293 -0.663
scikit =>  [-1.29292842 0.66325508] 

Considering the F1 and F2 as a XY scatter point, I want to know why the value of Y in XLSTAT and scikit are opposite?

mahmood
  • 193
  • 7
  • 1
    Does this answer your question? [Does the sign of scores or of loadings in PCA or FA have a meaning? May I reverse the sign?](https://stats.stackexchange.com/questions/88880/does-the-sign-of-scores-or-of-loadings-in-pca-or-fa-have-a-meaning-may-i-revers) – tchainzzz Dec 26 '20 at 03:27

1 Answers1

1

PCA will give you the eigenvectors of the empirical covariance matrix of your data. Then, the sign of a particular eigenvalue doesn't matter, which has been addressed here. As a brief recap, let $\Sigma$ be the empirical covariance matrix of your data; suppose we have some vector $v$ such that the exists some $\lambda \in \mathbb{R}$ where $\Sigma v = \lambda v$ (def. of eigenvector/eigenvalue). Let $w = -v$; then $w$ is also an eigenvector of $\Sigma$, as $\Sigma w = -\lambda w$. From a non-rigorous standpoint, all you've done is "flip" the eigenvector, meaning that the corresponding principal component is essentially the same. For a more in-depth explanation, see here.

Thus, mathematically, both of the implementations above are equivalent and valid. I'm not sure how XLSTAT and scikit implement PCA internally, but different implementations can certainly lead to this behavior.

tchainzzz
  • 1,016
  • 3
  • 11