Implementing Canonical Correlation Analysis in practice

Question

I'm trying to implement Canonical Correlation Analysis using the Eigen template header c++ library to better my understanding of both the library and the particular statistical technique.

So far, the best reference I've come across for how this is implemented in practice is this documentation of the CCA R package link. As I understand it, the steps presented here are as follows:

Take input sample matrices $X,Y$ with $n$ observations each (not sure about the case where we have a different number of rows in them) and where the number of columns in each ($p,q$) are the variable columns.
Compute $S_{xx} := X^tX$ and $S_{yy} := Y^tY$
Compute $P_x := \frac{1}{n}X(S_{xx})^{-1}X^t$ and $P_y := \frac{1}{n}Y(S_{yy})^{-1}Y^t$
Compute the eigen decomposition of $P_xP_y$ to get the squared canonical correlations (eigenvalues) and the canonical variables $U$ (eigenvectors) then repeat for $P_yP_x$ to get the other canonical variables ($V$).

So far so good, I know the Eigen library has eigen solvers - is there a way to avoid doing two different eigen decompositions for step #4? Once you do $P_xP_y$ maybe there's a more efficient way of getting the eigenvectors of $P_yP_x$?

The paper also mentions what seems to be an extremely computationally intense regularization approach where we grid over values for two parameters $\lambda_1,\lambda_2$ to 'regularize' possibly ill-conditioned $S_{xx},S_{yy}$ matrices. Then it does leave-one-out cross validation which effectively has to repeat the 4th step $n$ times. If the grid is $100\times 100$ that's $10,000 \times n$ total times we must repeat step 4, which seems insane. Am I interpreting that correctly? Are there other alternatives that don't explode like this in terms of computation compared to the unregularized variant?

I think you would use the singular value decomposition of x and y to generate these. The regularised version can be directly computed from the original svdd decomposition, no need to do new eigendecomposition. See Eg elements of statistical learning (shrinkage) and pca by svd — seanv507, Nov 16 '15 at 06:57
CCA can be implemented different ways (with the same result). Here is how SPSS does it: http://stats.stackexchange.com/a/77309/3277, for your information. — ttnphns, Nov 16 '15 at 08:07
@ttnphns thanks this is helpful! Looks quite different than using the orthogonal projectors in the paper I found, I'll have to check why these are equivalent approaches — Palace Chan, Nov 17 '15 at 05:58
@seanv507 thanks for the reference I'll check that out tomorrow — Palace Chan, Nov 17 '15 at 05:59
@seanv507 Do you know, in equation 3.47, how $(X^TX + \lambda I)^{-1}$ which is $(V D^2 V^T + \lambda I)^{-1}$ suddently simplifies to $(D^2 + \lambda I)^{-1}$? It's almost like they pull in the $V^T$ on the left and the $V$ on the right inside which I can't quite reconcile... — Palace Chan, Nov 18 '15 at 04:39
you write $\lambda I$ as $V \lambda V^T$ using orthogonality property and commutativity of scalars and matrices. Then you find the inverse by solving $(VD^2V^T+\lambda I \mathbb){M} = \mathbb{I}$. See also sections 3.2.4 and 3.7 for canonical correlations analysis — seanv507, Nov 18 '15 at 09:26

Implementing Canonical Correlation Analysis in practice

0 Answers0