How to know if Canonical Correlation analysis is overfitting?

Question

I have X = (21,15) -> 21 observations, 15 variables; Y = (21,6) -> 21 observations, 6 variables. When I do CCA on X and Y, I get correlation coefficients of 1, but I know that it shouldnt happen for my data. How can I explain the overfitting of CCA? If the total variables are less than observations, CCA works fine. Why does this happen? Is there a mathematical proof?

score 2 · Answer 1 · answered Aug 15 '18 at 22:50

Yes, there's an interesting geometric interpretation that easily shows that if $n \le p + q$, some of the canonical correlations will become 1. In short and using your definitions of $X$ and $Y$, this has to do with the row-space of the data matrix $Z = [X,Y]^T$, which is over-determined when $p+q > n - 1$.

$n$: number of observations, and $p,q$: dimension of each set.

This is hard to visualize with your values for $n,p,q$, but I've created a small toy example in this link that explains this, with code and figures here.

I've answered a similar question before here.

How to know if Canonical Correlation analysis is overfitting?

1 Answers1