1

Suppose I have a population of N pairs of (X,Y). I know the correlation of the population is Z. I now break the population into two unequal sets (n1 + n2 = N and n1 <> n2). If I calculate the correlation of one set, can I infer the correlation of the other set? That is, is there a relationship in correlation similar to N x Z = n1 x z1 + n2 x z2 ? You have an equivalent form in variance, where you can break a population into two sets and the variance of one subset can be derived from the variance of the remainder (and knowing the variance of the population). But I've not seen anything similar for correlation.

  • 1
    Correlations are complicated: first you have to [combine the covariance matrices](https://stats.stackexchange.com/questions/51622/) and then compute the correlations from that: this follows from the usual formulas and is unproblematic. The formulas for the combination demonstrate you cannot compute the correlation *only* from knowing the subpopulation sizes and the subpopulation correlations. – whuber Feb 21 '21 at 14:41
  • Thanks for the reply. Logically, if you know the correlation of the population, you also know the details of the components (ie, variance of X, variance of Y, covariance of XY) for both the population and the sample n1. So, you should be able to calculate the correlation for the sample n2. – Harold Cataquet Feb 22 '21 at 16:49
  • That is incorrect. As an example of the problem, consider the subpopulations of $(x,y)$ data $\{(0,1),(1,0)\}$ and $\{(100,101),(101,100)\},$ each of which has a correlation of $-1.$ The union of these two subpopulations has correlation almost $1.$ – whuber Feb 22 '21 at 17:27
  • 1
    Absolutely correct! Was hoping there was a shortcut. – Harold Cataquet Feb 23 '21 at 21:04

0 Answers0