4

I am reading a material about canonical correlation and it introduces a concept named "redundancy". I have been puzzled for one day but still could not get a understanding. The following is a screen capture of the part discussing "redundancy".enter image description here

where $\bf{Y}$ is the dependent variable vector, $\bf{X}$ is the independent variable vector, $\bf{u}$ is the canonical linear combination of $\bf{X}$ and $\bf{t}$ is the canonical linear combination of $\bf{Y}$. $\bf{u}=\bf{X}\bf{b}$ and $\bf{t}=\bf{Y}\bf{a}$.

I have several questions.

  1. Why this concept is named redundancy? Is there any intuitive interpretation?
  2. Why the first term of $Rd(\bf{t}|\bf{u})$ is just the squared canonical correlation $r^2(\bf{t},\bf{u})$?
  3. Why the second term is $\frac{\bf{g}'\bf{g}}{q}$?

Here $\bf{g}$ is the correlation between $\bf{Y}$ and $\bf{t}$. enter image description here

Ferdi
  • 4,882
  • 7
  • 42
  • 62
Tony
  • 1,583
  • 4
  • 15
  • 20
  • 1
    What's the name of the book? Is it available to douwnload at some place? – ttnphns May 24 '15 at 15:22
  • @ttnphns The name is Analyzing Multivariate Data by James. M. Lattin. It seems not available online. I am just taking pictures of it. I am still struggling to figure out the above problems. – Tony May 24 '15 at 21:24
  • 1
    @ttnphns I found another material which only says "redundancy of A given B is an index of of the proportions variance of A predicable from B". Does this mean "redundancy" is a measure of variance explanation? like r-squared? What is the difference? – Tony May 24 '15 at 21:27
  • I spent 4 more hours studying this problem and carefully examined the how variances are explained by principal components in PCA since the material mentions "the same notation as the variance in Y accounted for by a principal component". I now incline to say this so-called measure $Rd$ is a heuristic index rather than one with robust mathematical derivation. $\bf{Y}$ is standardized data matrix of $q$ dependent variables, so the total variance of $\bf{Y}$ is $q$ given that the variance of each dimension is normalized to $1$. – Tony May 25 '15 at 06:40
  • $\bf{g}'\bf{g}$ could be a heuristic analogy to PCA's way of computing the variance explained by a principal component -- the squared sum of component loadings. The difference is, it is easily provable in PCA that total variance of all principal components equals to the total variance of the original variables. I tried but failed to prove in this canonical correlation case that summing $\bf{g}'\bf{g}$ over all canonical variate $\bf{t}$ equals $q$, where $q$ is the total variance of $\bf{Y}$. – Tony May 25 '15 at 06:45
  • I checked the original paper "A GENERAL CANONICAL CORRELATION INDEX" and there is no proof in their paper either. The formula is just stated in the paper. – Tony May 25 '15 at 06:53
  • I wonder why you find these concepts difficult. Canonical correlation (and hence its corresponding square) is pearson correlation between the two derived variables called canonical variates: it is the angle between them ([see](http://stats.stackexchange.com/a/65817/3277)). In order to assess the variance in set X explained by the x-variate we compute the so called canonical loadings, and to assess the variance in set Y explained by the x-variate we compute the so called canonical cross-loadings ([see](http://stats.stackexchange.com/a/77309/3277)). – ttnphns May 25 '15 at 07:57
  • 1
    ...computing loadings and cross-loadings is labeled "redundancy analysis". – ttnphns May 25 '15 at 07:59

0 Answers0