1

I want to use Canonical Correlation Analysis (CCA) to identify relationships between two sets of variables X and Y. The CCA should give a score (highest correlation) between two samples of X and Y.

I tried to implement it via scikit-learn like that:

X = [[1, 0, 0], [1, 1, 0], [1, 1, 1], [2, 0, 0]]
Y = [[0, 0, 1], [1, 0, 0], [2, 2, 2], [3, 5, 4]]

cca = CCA(n_components=2)
cca.fit(X, Y)

for x in X:
    print "----------"
    print x
    for y in Y:
        print str(y) + " : " + str(cca.score(x,y))

The output for the first element in X:

[1, 0, 0]
[0, 0, 1] : 0.35461498401
[1, 0, 0] : -0.0502507710089
[2, 2, 2] : 0.0
[3, 5, 4] : -22.2417510911

But the result is not as expected, there is no highest correlation between [1, 0, 0] in X and Y. But the score returns something else back: "Returns the coefficient of determination R^2 of the prediction".

How is it possible via CCA to find the highest correlation matching pairs (X#n, Y#m)? Is it possible via scikit-learn, or I have to use another library?

Thanks in advance.

user3563297
  • 21
  • 1
  • 3

2 Answers2

-1

I have also tried to use scikit-learn to do CCA with my data. I ended up using MATLAB as my major language. MATLAB has a function called CCA.

>> X
X =
     1     0     0
     1     1     0
     1     1     1
     2     0     0

>> Y
Y =
     0     0     1
     1     0     0
     2     2     2
     3     5     4

>> R = corrcoef(X,Y)
R =
    1.0000   -0.0279
   -0.0279    1.0000

Then if you use this CCA function you can get:

[A,B,r,U,V,stats] = canoncorr(X,Y);

>> U
U =
   -0.7406   -1.2792   -0.2554
   -0.7406    0.8112    1.0215
    0.1058    0.7800   -1.2769
    1.3754   -0.3120    0.5108

>> V
V =
   -0.7406   -1.2792   -0.2554
   -0.7406    0.8112    1.0215
    0.1058    0.7800   -1.2769
    1.3754   -0.3120    0.5108

>> corrcoef(U,V)
ans =
     1     1
     1     1

Actually, we have seen CCA worked in this case. I hope this information is helpful for you. As for Python, I will try to verify it later.

gung - Reinstate Monica
  • 132,789
  • 81
  • 357
  • 650
user48135
  • 101
-1

The function CCA.fit probably only changes the inner state of the CCA object. I assume you would need to call X_c, Y_c = cca.transform(X, Y) after the function fit() to get the desired result.

Nick Stauner
  • 11,558
  • 5
  • 47
  • 105