1

I'm trying to implement Canonical Correlation Analysis using the Eigen template header c++ library to better my understanding of both the library and the particular statistical technique.

So far, the best reference I've come across for how this is implemented in practice is this documentation of the CCA R package link. As I understand it, the steps presented here are as follows:

  1. Take input sample matrices $X,Y$ with $n$ observations each (not sure about the case where we have a different number of rows in them) and where the number of columns in each ($p,q$) are the variable columns.
  2. Compute $S_{xx} := X^tX$ and $S_{yy} := Y^tY$
  3. Compute $P_x := \frac{1}{n}X(S_{xx})^{-1}X^t$ and $P_y := \frac{1}{n}Y(S_{yy})^{-1}Y^t$
  4. Compute the eigen decomposition of $P_xP_y$ to get the squared canonical correlations (eigenvalues) and the canonical variables $U$ (eigenvectors) then repeat for $P_yP_x$ to get the other canonical variables ($V$).

So far so good, I know the Eigen library has eigen solvers - is there a way to avoid doing two different eigen decompositions for step #4? Once you do $P_xP_y$ maybe there's a more efficient way of getting the eigenvectors of $P_yP_x$?

The paper also mentions what seems to be an extremely computationally intense regularization approach where we grid over values for two parameters $\lambda_1,\lambda_2$ to 'regularize' possibly ill-conditioned $S_{xx},S_{yy}$ matrices. Then it does leave-one-out cross validation which effectively has to repeat the 4th step $n$ times. If the grid is $100\times 100$ that's $10,000 \times n$ total times we must repeat step 4, which seems insane. Am I interpreting that correctly? Are there other alternatives that don't explode like this in terms of computation compared to the unregularized variant?

Palace Chan
  • 759
  • 3
  • 9
  • 17
  • 1
    I think you would use the singular value decomposition of x and y to generate these. The regularised version can be directly computed from the original svdd decomposition, no need to do new eigendecomposition. See Eg elements of statistical learning (shrinkage) and pca by svd – seanv507 Nov 16 '15 at 06:57
  • CCA can be implemented different ways (with the same result). Here is how SPSS does it: http://stats.stackexchange.com/a/77309/3277, for your information. – ttnphns Nov 16 '15 at 08:07
  • @ttnphns thanks this is helpful! Looks quite different than using the orthogonal projectors in the paper I found, I'll have to check why these are equivalent approaches – Palace Chan Nov 17 '15 at 05:58
  • @seanv507 thanks for the reference I'll check that out tomorrow – Palace Chan Nov 17 '15 at 05:59
  • Page 66 of elements of statistical learning (equation 3.47) – seanv507 Nov 17 '15 at 07:44
  • @seanv507 Do you know, in equation 3.47, how $(X^TX + \lambda I)^{-1}$ which is $(V D^2 V^T + \lambda I)^{-1}$ suddently simplifies to $(D^2 + \lambda I)^{-1}$? It's almost like they pull in the $V^T$ on the left and the $V$ on the right inside which I can't quite reconcile... – Palace Chan Nov 18 '15 at 04:39
  • you write $\lambda I$ as $V \lambda V^T$ using orthogonality property and commutativity of scalars and matrices. Then you find the inverse by solving $(VD^2V^T+\lambda I \mathbb){M} = \mathbb{I}$. See also sections 3.2.4 and 3.7 for canonical correlations analysis – seanv507 Nov 18 '15 at 09:26

0 Answers0