What's the relation between Canonical Correlation Analysis (CCA) and Regression?

Question

I'm wondering if CCA is just a feature transformation method. Can I use it for predicting continuous variables like in regression methods?

What I'm doing is to use CCA to transform my training and test matrices, and then use the transformed matrices as features to do regression (e.g., linear regression and decision tree regression). But I'm not sure if it helps or CCA and regression are basically the same thing?

As explained [here](http://stats.stackexchange.com/q/65692/3277) CCA can be seen as multivariate multiple linear regression. Unlike PCA, it is not just "transformation" of data to summarize it, because, as in regression, there is an external data to predict. — ttnphns, Nov 13 '16 at 18:22

Toru Kikuchi · Answer 1 · 2020-01-08T03:58:39.870

I am not an expert on this subject. But I guess you can follow procedure similar to the PLS regression.

I mean:

Find the first CCA components of X (feature variables) and Y (target variables). Name them as t and u, respectively (although u is not used afterwards). Store the weight w and the loading vector p.
Deflate X so that it has no correlation with t : X -> X-(1/t^T t)tt^T X
Perform linear regression of Y on t. Take the residual (estimation error) of this linear regression as the new, deflated Y : Y -> Y-(1/t^T t)tt^T Y
Take the deflated X and Y as new X and Y, and go back to 1.
Iterate the steps from 1 to 4 as many as you like. Then you get the final linear estimation : Y_{estimate} = X_{test}R where R = W(P^TW)^{-1}. Here, W=(w_1, w_2,...) and P=(p_1, p_2, ...) where w_k and p_k are the k-th weight and loading vector.

Both X and Y are assumed to be centered (their sample means are zero). In practice, you should add or subtract the means of the train data when predicting.

Note that the "k-th CCA component" above is generally not the same as that obtained by the usual CCA decomposition, where you obtain CCA components theoretically at once by solving only one eigenvalue equation without performing any deflation. (I am not familiar enough on this point.)

For your information, the procedure of the PLS regression will be most concisely understandable by the Scikit-learn implementation:

https://github.com/scikit-learn/scikit-learn/blob/7e85a6d1f/sklearn/cross_decomposition/_pls.py#L137

What's the relation between Canonical Correlation Analysis (CCA) and Regression?

1 Answers1