7

I want to perform regularized canonical correlation between two matrices with more variables than observations (same subjects), one of which is very large (~18000 columns). The only r package that could handle the matrix dimension was PMA (tried mixOmics, RGCCA...).

The problem is that the output of the function CCA gives only the canonical variates and correlations, and the canonical weights. For a good interpretation of the results the canonical scores are necessary, but this is (weirdly) not provided from the CCA function. I am not very proficient with matrix algebra (and statistics in general) and I am not sure about how to compute this.

Example from the package documentation:

### create matrices
> u <- matrix(c(rep(1,25),rep(0,75)),ncol=1)
> v1 <- matrix(c(rep(1,50),rep(0,450)),ncol=1)
> v2 <- matrix(c(rep(0,50),rep(1,50),rep(0,900)),ncol=1)
> x <- u%*%t(v1) + matrix(rnorm(100*500),ncol=500)
> z <- u%*%t(v2) + matrix(rnorm(100*1000),ncol=1000)
### perform canonical correlation (3 canonical variates)    
> out <- CCA(x,z,typex="standard",typez="standard",K=3)

> print(out,verbose=TRUE)
Call: CCA(x = x, z = z, typex = "standard", typez = "standard", K = 3)

Num non-zeros u's:  59 88 75 
Num non-zeros v's:  180 154 164 
Type of x:  standard 
Type of z:  standard 
Penalty for x: L1 bound is  0.3 
Penalty for z: L1 bound is  0.3 
Cor(Xu,Zv):  0.9578624 0.93371 0.9418701

Component  1 :

Row Feature Name Row Feature Weight
1                 1              0.112
2                 2              0.080
3                 3              0.124   
4                 4              0.165
5                 5              0.087
........
........            
Column Feature Name Column Feature Weight
1                    10                 0.006
2                    15                -0.027
3                    25                -0.025
4                    28                 0.030
5                    35                 0.035
........
........

similar for Components 2 and 3.

The other doubt is about the significance of the results: are the non zero variables given already computed as "significantly different than 0"? Although a significance threshold is not asked.

kjetil b halvorsen
  • 63,378
  • 26
  • 142
  • 467
Alice
  • 825
  • 2
  • 8
  • 12
  • I could direct you to my [answer](http://stats.stackexchange.com/a/77309/3277) with CCA algorithm (standard, not regularized). There is a bit of matrix algebra, but everyting is described. If I were `R` user I might help you numerically. – ttnphns Mar 22 '14 at 12:53
  • I already read your answer before, but going through it more carefully I see that the computation of the loadings would be, according to your nomenclature: A1=S1^−1(S1R1S1)C1 ; do you agree that this is equivalent to the scores (for each variable on the variate, of course)? – Alice Mar 23 '14 at 16:26
  • Implementing this in R I get a matrix with all non zeros; since C1 has many zeros (only some variables are related to the canonical variate) I would expect the same zeros in the output A1 – Alice Mar 23 '14 at 18:14
  • I assumed we're talking about matrix multiplication... – Alice Mar 23 '14 at 18:25
  • Could you tell me what would a canonical loading be for the single value, I mean, the computation done on the single observation? Something like A(i,j)=Sd(V(i))*cor(V(i,j),V(?))*weight(V(i,j)) – Alice Mar 23 '14 at 18:43
  • 3
    Canonical loadings are easy to compute after the analysis (this is probably why many packages don't calculate them for you). Loadings and cross-loadings are the correlations between the variables and the canonical variates. – ttnphns Mar 23 '14 at 20:31
  • Loadings and scores are different notions. – ttnphns Nov 11 '17 at 16:29
  • I know this is old, but looking at the source of the functions, it could be easily modified to return the canonical scores you'll need to dig into the CCAAlgorithm to find it – llrs Sep 19 '18 at 15:14

0 Answers0