For my current project I am using sklearn.cross_decomposition.CCA
. On several wepages (e.g. https://stats.idre.ucla.edu/r/dae/canonical-correlation-analysis/ or https://www.uaq.mx/statsoft/stcanan.html or https://pure.uvt.nl/ws/portalfiles/portal/596531/useofcaa_ab5.pdf) it says that canonical loadings can be computed as correlations between variables and the canonical variates. However, I could not reproduce this using scipy.stats.pearsonr
? Is this just false information or am I doing something wrong?
Here's an example
from sklearn.cross_decomposition import CCA
import numpy as np
from scipy.stats import pearsonr
# compute CCA
X = np.array([[0., 0., 1.], [1.,0.,0.], [2.,2.,2.], [3.,5.,4.]])
Y = np.array([[0.1, -0.2], [0.9, 1.1], [6.2, 5.9], [11.9, 12.3]])
cca = CCA(n_components=1)
cca.fit(X, Y)
X_c, Y_c = cca.transform(X, Y)
# obtain loadings for X variable set
x_loadings = cca.x_loadings_
print(f"The first variable has a loading of {x_loadings[0]} on the first canonical variate")
# try to manually calculate loadings using pearson correlation.
r,_ = pearsonr(np.squeeze(X[:,0]),np.squeeze(X_c))
print(f"Correlation between first variable and first canonical variate: {r}")
which gives you:
The first variable has a loading of [0.61454275] on the first canonical variate
Correlation between first variable and first canonical variate: 0.9610851804703184
As you can see those numbers are totally different...