CCA can be formulated in terms of Frobenius norm minimization: if your data are $X, Y$, then the optimization problem is
$$\min_{W^{(x)}, W^{(y)}}||XW^{(x)} - YW^{(y)}||_F$$ such that
- $W^{(x)T} \frac{X^T X}{n} W^{(x)} = W^{(Y)T}\frac{Y^TY}{n}W^{(y)T} = I$ (coordinates in the projected space are uncorrelated; no cheating by repeating really juicy latent factors you already knew)
- $w^{(x)T}_iX^TYw_j^{(y)} = 0$ if $i\neq j$ (not sure how this constraint helps but it's in the paper I'm reading, see bottom, so I'll go with it).
Maybe your formulation could be made to resemble this. $A$ is like the projections $XW^{(x)}$ and $YW^{(y)}$: a shared low-dimensional latent variable. $W^{(x)}$ is like the pseudoinverse of $B$; $W^{(x)}$ projects data into the latent space while $B$ turns latent variables into data.
Trying to make this more formal, we can transform the CCA objective function into $$||XW^{(x)} - YW^{(y)}||_F^2 = 2||XW^{(x)} - A ||_F^2 + 2||YW^{(y)} - A||_F^2$$ if $A$ is chosen to be midway between the two data projections. You can almost arrive at something like this from CMF if you omit the penalty on $A$ and penalize $||XB^{\dagger T} - A||$ instead of $||X-AB^T||$.
I can expand more on the algebra behind that last claim if you want, but the difference is worth emphasizing in terms of qualitative aspects. CMF penalizes the distance between the model and the data in the ambient (larger-dimensional, observed-data) space, which is natural. CCA penalizes projections of the data within the latent (lower dimensional, hidden-parameter) space, which is clumsy and necessitates a bunch of extra constraints to keep the optimizer from driving everything to zero.
To understand how the two strategies relate without wishing away differences in the objective functions, I'm gonna need to bring in the cavalry (referring, of course, to Michael Jordan and Francis Bach).
https://www.di.ens.fr/~fbach/probacca.pdf
In theorem 2, the cavalry say that maximum likelihood inference on a generative model resembling your CMF formulation yields parameters that are linear transformations of the canonical directions. Their model doesn't yield the exact canonical directions -- they appear inside MLE's for similar parameters -- and their model does not have the isotropic errors or the factor priors implied by your CMF formulation.
I hope that gives a useful window into the various perspectives competing here. For more information on formulations of CCA, I also used these papers.
http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.702.5978&rep=rep1&type=pdf
http://research.cs.aalto.fi/pml/online-papers/klami13a.pdf