Pearson correlation and partitioning data

Asked Nov 09 '17 at 15:29

Active Nov 09 '17 at 19:52

Viewed 48 times

I have encountered this problem and I am struggling with the intuition.

I have a data sample $(X_i,Y_i)_{i=1}^n$ with sample correlation $\rho$. Let $\rho_1$ be the sample correlation for sample $(X_i,Y_i)_{i=1}^{n/2}$ and $\rho_2$ be the sample correlation for sample $(X_i,Y_i)_{i=n/2+1}^{n}$.

In my data, I have $\rho = 0.9$, $\rho_1 = 0.4$ and $\rho_2 = 0.4$. My understanding is that the Pearson correlation indicates the strength of a linear correlation between $X$ and $Y$. So if the correlation is not strong in $(X_i,Y_i)_{i=1}^{n/2}$ and $(X_i,Y_i)_{i=n/2+1}^{n}$, then how can the correlation be strong in the union of the data sets $(X_i,Y_i)_{i=1}^n$?

Looking at the formula, I guess it's possible since the means are different but I can't seem to grasp the intuition.

edited Nov 09 '17 at 19:52

Xi'an

90,397
9
157
575

asked Nov 09 '17 at 15:29

jerom

You comment about the means suggests you are on the right track. There is discussion of similar issues tagged [tag:simpsons-paradox] on this site. – mdewey Nov 09 '17 at 15:36
See the last figure in https://stats.stackexchange.com/a/13317/919 for an illustration. Although it is explained in terms of $R^2$, it's readily translated into sample correlations if you like. – whuber Nov 09 '17 at 19:57

Pearson correlation and partitioning data

0 Answers0