I have a test csv file and I have written a code via Scikit to show the PCA for that. I also use another tool in Excel (XLSTAT) to compare the results. The XLSTAT automatically calculates the number of features, however, based on my understanding, I have to specify how many components are needed using the scikit package. For example, while XLSTAT shows 5 features:
Factor scores:
F1 F2 F3 F4 F5
A1 -1.293 -0.663 -0.462 -0.713 0.010
A2 -0.297 0.293 -1.429 0.397 0.056
A3 2.328 0.069 0.987 -0.108 0.062
A4 -0.556 -2.273 0.538 0.344 -0.032
A5 1.823 0.775 -0.597 -0.052 -0.085
A6 -2.005 1.799 0.963 0.133 -0.011
In the following code, I specified 2 components:
x = StandardScaler().fit_transform(x)
pca = PCA(n_components=2)
principalComponents = pca.fit_transform(x)
print( principalComponents )
[[-1.29292842 0.66325508] [-0.29706395 -0.29346337] [ 2.32751305 -0.06850045] [-0.5558091 2.27288988] [ 1.82312052 -0.77527304] [-2.0048321 -1.7989081 ]]
As you can see, the first column in XLSTAT and scikit are the same. However, the second columns are negated. For example, considering F1 and F2, we see
XLSTAT => -1.293 -0.663
scikit => [-1.29292842 0.66325508]
Considering the F1 and F2 as a XY scatter point, I want to know why the value of Y in XLSTAT and scikit are opposite?