Relationship between coupled matrix factorization and CCA

Question

Canonical Correlation Analysis (CCA) computes a low-dimensional shared embedding of two set of variables $X$ and $Y$ such that the correlations among the variables between the two sets is maximized.

On the other hand, the Coupled Matrix Factorization (CMF) tries to jointly factorize $X$ and $Y$ into a set of low rank components. A CMF with $l_2$ regularization on the factors can be expressed as: \begin{equation*} \begin{aligned} & \underset{A,B,C}{\text{min}} & & \alpha \|X - AB^{\top} \|_F^2 + \beta \|Y - AC^{\top} \|_F^2 + \lambda (\| A \|_F^2 + \| B \|_F^2 + \| C \|_F^2) \\ &\text{subject to}& & \alpha + \beta = 1\\ \end{aligned} \end{equation*}

What is the exact relationship between CCA and CMF? Is the relationship analogous to the relationship between PCA and Factor Analysis, where in the latter the components are subject to orthogonality constraint, but for more than one set of variables?

Any pointers or explanations is highly appreciated.

CMF is not as well-known as CCA or PCA or Factor analysis. Can you add, to your question, links to sources explaining CMF theory and algorithm? — ttnphns, Oct 20 '17 at 12:32
Is there a reason you can't reformulate the CMF objective as \begin{equation*} \begin{aligned} & \underset{A,B,C}{\text{min}} & & \|Z - AD^{\top} \|_F^2 + \lambda (\| D \|_F^2 + \| C \|_F^2) \\ &\text{subject to}& & \alpha + \beta = 1\\ \end{aligned} \end{equation*} where $D=[B|C]$ and $Z=\left[\frac{A}{B}\right]$? In that case it's just MF. — eric_kernfeld, Oct 20 '17 at 13:30
@eric_kernfeld What would the point of $\alpha,\beta$ be then? — cangrejo, Oct 20 '17 at 14:08
Oops, copied the LaTeX from the question and forgot to remove those. Are $\alpha$ and $\beta$ allowed to vary freely, or specified by the analyst? I'm presuming they are at least supposed to be nonnegative... — eric_kernfeld, Oct 20 '17 at 15:03
@eric_kernfeld It is safe to assume that both $\alpha$ and $\beta$ are nonnegative. But, I think Eric is right. When $\alpha = \beta$, under squared frobenius norm, the original objective function can be reformulated as a single MF of the matrix, concatenated from $X$ and $Y$. — Amir, Oct 21 '17 at 10:40
When they're fixed but not equal, you can still reformulate it into a single problem by pushing their square roots inside $X$, $Y$, $B$, and $C$, and adjusting the penalty on $B$ and $C$ accordingly. It won't quite be a Frobenius penalty on the concatenated matrix (what I called D), but it will be quadratic in the matrix entries. What would make this really interesting is if the optimizer gets to choose the $\alpha, \beta$ that are easiest for her. How would you solve that? — eric_kernfeld, Oct 21 '17 at 11:34

score 2 · Accepted Answer · answered Oct 22 '17 at 02:00

CCA can be formulated in terms of Frobenius norm minimization: if your data are $X, Y$, then the optimization problem is

$$\min_{W^{(x)}, W^{(y)}}||XW^{(x)} - YW^{(y)}||_F$$ such that

$W^{(x)T} \frac{X^T X}{n} W^{(x)} = W^{(Y)T}\frac{Y^TY}{n}W^{(y)T} = I$ (coordinates in the projected space are uncorrelated; no cheating by repeating really juicy latent factors you already knew)
$w^{(x)T}_iX^TYw_j^{(y)} = 0$ if $i\neq j$ (not sure how this constraint helps but it's in the paper I'm reading, see bottom, so I'll go with it).

Maybe your formulation could be made to resemble this. $A$ is like the projections $XW^{(x)}$ and $YW^{(y)}$: a shared low-dimensional latent variable. $W^{(x)}$ is like the pseudoinverse of $B$; $W^{(x)}$ projects data into the latent space while $B$ turns latent variables into data.

Trying to make this more formal, we can transform the CCA objective function into $$||XW^{(x)} - YW^{(y)}||_F^2 = 2||XW^{(x)} - A ||_F^2 + 2||YW^{(y)} - A||_F^2$$ if $A$ is chosen to be midway between the two data projections. You can almost arrive at something like this from CMF if you omit the penalty on $A$ and penalize $||XB^{\dagger T} - A||$ instead of $||X-AB^T||$.

I can expand more on the algebra behind that last claim if you want, but the difference is worth emphasizing in terms of qualitative aspects. CMF penalizes the distance between the model and the data in the ambient (larger-dimensional, observed-data) space, which is natural. CCA penalizes projections of the data within the latent (lower dimensional, hidden-parameter) space, which is clumsy and necessitates a bunch of extra constraints to keep the optimizer from driving everything to zero.

To understand how the two strategies relate without wishing away differences in the objective functions, I'm gonna need to bring in the cavalry (referring, of course, to Michael Jordan and Francis Bach).

https://www.di.ens.fr/~fbach/probacca.pdf

In theorem 2, the cavalry say that maximum likelihood inference on a generative model resembling your CMF formulation yields parameters that are linear transformations of the canonical directions. Their model doesn't yield the exact canonical directions -- they appear inside MLE's for similar parameters -- and their model does not have the isotropic errors or the factor priors implied by your CMF formulation.

I hope that gives a useful window into the various perspectives competing here. For more information on formulations of CCA, I also used these papers.

http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.702.5978&rep=rep1&type=pdf

http://research.cs.aalto.fi/pml/online-papers/klami13a.pdf

Relationship between coupled matrix factorization and CCA

1 Answers1

Linked