I want to use Canonical Correlation Analysis (CCA) to identify relationships between two sets of variables X and Y. The CCA should give a score (highest correlation) between two samples of X and Y.
I tried to implement it via scikit-learn like that:
X = [[1, 0, 0], [1, 1, 0], [1, 1, 1], [2, 0, 0]]
Y = [[0, 0, 1], [1, 0, 0], [2, 2, 2], [3, 5, 4]]
cca = CCA(n_components=2)
cca.fit(X, Y)
for x in X:
print "----------"
print x
for y in Y:
print str(y) + " : " + str(cca.score(x,y))
The output for the first element in X:
[1, 0, 0]
[0, 0, 1] : 0.35461498401
[1, 0, 0] : -0.0502507710089
[2, 2, 2] : 0.0
[3, 5, 4] : -22.2417510911
But the result is not as expected, there is no highest correlation between [1, 0, 0] in X and Y. But the score returns something else back: "Returns the coefficient of determination R^2 of the prediction".
How is it possible via CCA to find the highest correlation matching pairs (X#n, Y#m)? Is it possible via scikit-learn, or I have to use another library?
Thanks in advance.