2

Given two sets of variables and the objective of finding correlations among the variables in the two sets, is there any simple examples or explanation, for a group of biologists knowing only basic statistics, to illustrate the benefit of canonical correlation analysis (CCA) over Pearson's correlation between pairs of variables from the two sets?

kjetil b halvorsen
  • 63,378
  • 26
  • 142
  • 467
blueskyddd
  • 165
  • 7
  • Could help https://stats.stackexchange.com/q/65692/3277 – ttnphns Jan 05 '20 at 08:10
  • Thanks @ttnphns! While that's a great answer to `what` and `how` of CCA, I'm looking for the `why`, esp. to the objective in my question, something similar to Section 5.1 of this [article](https://www.cs.cmu.edu/~tom/10701_sp11/slides/CCA_tutorial.pdf). – blueskyddd Jan 07 '20 at 14:39
  • In your article sec. 5.1 the example is not very instructive because it is degenerate: the canonical correlation is 1. Instead, I would recommend maybe to turn to my answer with the pics and turn to the pic with multiple regression. Multiple regression _is_ CCA with one of the two sets (X and Y), the Y set, consisting just on _one_ variable; X of two (X1, X2). – ttnphns Jan 07 '20 at 15:39
  • (cont.) You see that - _as long as we agree to dismiss individual variable's identities_ and instead agree to have their _linear constructs_ as their representatives or proxies, - correlation between Y and Y' (the latter being a linear construct of X1 and X2 since it lies in their plane) _is "better"_ than two separate correlations: Y /w X1 and Y /w X2. First, it is one value instead of two; second, it is higher correlation than any of the two. Y' is the "best" proxy of the pair X1,X2 in that sense that it has the maximal possible correlation with Y. – ttnphns Jan 07 '20 at 15:39
  • (cont.) It is thus the maximal possible relation which can be found between set Y and set X. All the said is exactly true in general CCA where both sets X and Y consist of more than one variable. If you add Y2 to Y set (so that there is Y1 and Y2 forming set Y now) - there will be two constructs found - Vx (representing X) and Vy (representing Y) correlation between which will me maximal possible (see pic 3 in my post). – ttnphns Jan 07 '20 at 16:01
  • Thanks, @ttnphns! It helps a lot! – blueskyddd Jan 09 '20 at 15:25
  • Well it ain't pretty, but it is [understandable](https://www.statisticssolutions.com/canonical-correlation/). – Carl Feb 25 '20 at 05:28

0 Answers0