Interpreting canonical correlation plots

Question

I have learnt (from another answer) that

CCA will find pairs of vectors ($w$,$v$) such that projections $X_w$ and $Y_v$ have maximal possible correlations (the pairs will be ordered in the order of decreasing correlations). Projection vectors are normalized such that the variance of $X_w$ and of $Y_v$ is equal to $1$. This means that projections are not only correlated, but "on the same scale" and hence can be directly compared.

So if I plot $w$ vs $v$ (the components of the CCA), should all the samples be on the diagonal $w = v$ or $w = -v$?

From what I learned, they should be on the diagonal, but on some plots I have samples that are not perfectly fit on the diagonal and I don't understand why.

eric_kernfeld · Accepted Answer · 2017-10-22T22:02:42.297

They are not necessarily confined to the diagonal. The underlying assumption behind CCA is that $X$ and $Y$ share some low-dimensional latent factor, so that $X \approx Az_x + Cz$ and $Y \approx Bz_y + Dz$ with $z_x, z_y, z$ all independent. CCA approximates the shared latent factor $z$ from both ends by trying to find projectors $w$ and $v$ that invert $C$ and $D$ (and map the column spaces of A or B to 0). But, it's not guaranteed to succeed perfectly, nor are the data guaranteed to conform to that generative model.

Example

If you believe the model, then the distance between the samples and the diagonal would measure both contamination by noise (implied by the $\approx$) and the variation due to $z_x$ or $z_y$. For example, if $A=B=0$ and $C$, $D$ are $\begin{bmatrix} 1 &1 \\ 1 & 0 \\ 1 & 0 \end{bmatrix}$, and if the $\approx$ is taken to mean that iid Gaussian noise gets added to the entries after everything is finished, then CCA might use the following projections. (Caveat: I haven't done the actual math to confirm these are correct, and they also are not normalized).

$[0, 1, 1 ]$ (reconstructs the first latent coordinate)
$[2, -1, -1]$ (reconstructs the second latent coordinate)

But even though I've given CCA the oracular truth about the projections it should be using, the noise means that those projection cannot exactly reconstruct the latent variates as the occurred during the data generation. If we make $A$ and $B$ nonzero, then that affects the results as well.

Worst-case scenario

In the situation that $A=B=C=D$, the projections are not even reconstructing $z_i$ for case $i$. They are at best recovering $z_{x,i}+ z_i$ and $z_{y,i} + z_i$. If $A=500C$, then this worsens to $500z_{x,i} + z_i$. This is no longer a problem with estimation error so much as a fundamental identifiability issue. Even infinite data won't help. If $A$ and $B$ are merely similar to $C$ and $D$, that will also make things really difficult, unless you have an infinite amount of data.

So if the samples are not on the diagonal there is a need for other variables to explain them? Then the distance betweeb the samples and the diagonal would be an indication of the fitness/correctness of the model, is it? — llrs, Oct 22 '17 at 20:02
The example is helpful, but I don't understand the worst-case scenario. I do understand that the effect size of a variable has to be taken into account and that too large effect size won't help but I lost the reason where $A=B=C=D$ is not reconstructing $z_i$. I will wait a bit to mark your answer (if no other answer appears). — llrs, Oct 23 '17 at 05:59

Interpreting canonical correlation plots

1 Answers1

Example

Worst-case scenario