3

I have two variables x1 and x2 which measure relatively similar things (r ~ 0.6), with x2 slightly larger than x1 on average. I then created a new variable x3 by subtracting the two: x3 = x1 - x2.

However, when I ran the Pearson correlations, x3 is strongly negatively correlated with x2 as expected (r ~ -0.6), but x3 is not very correlated with x1 (r ~ 0.1). How is this possible?

Nick Cox
  • 48,377
  • 8
  • 110
  • 156
Hank Lin
  • 439
  • 4
  • 12
  • 2
    A scatter plot matrix should help. – Nick Cox Dec 11 '18 at 20:37
  • 2
    Possible duplicate of [When A and B are positively related variables, can they have opposite effect on their outcome variable C?](https://stats.stackexchange.com/questions/229052/when-a-and-b-are-positively-related-variables-can-they-have-opposite-effect-on) – sds Dec 11 '18 at 21:07
  • I have a vague memory of an even closer duplicate but I can not find it. – Sextus Empiricus Dec 13 '18 at 14:07

4 Answers4

12

Here's a simple example. Suppose $ε_1$ and $ε_2$ are independent standard normal random variables. Define $X_1 = ε_1$, $X_2 = X_1 + ε_2$, and $X_3 = X_1 - X_2$. The correlation of $X_1$ with $X_2$ is then $\tfrac{1}{\sqrt{2}} \approx .71$. Likewise, the correlation of $X_2$ with $X_3$ is $-\tfrac{1}{\sqrt{2}}$. But the correlation of $X_1$ with $X_3$ is the correlation of $ε_1$ with $ε_1 - (ε_1 + ε_2) = -ε_2$, which is 0 since the $ε_i$s are independent.

Kodiologist
  • 19,063
  • 2
  • 36
  • 68
2

This is by construction of $x_3$. Given that $x_2$ and $x_1$ are closely related - in terms of their Pearson correlation if you subtract one from the other, you reduce correlation. The best way to see that is to consider the extreme scenario of complete correlation, i.e., $x_2=x_1$, in which case $x_3=x_1-x_2=0$, which is fully deterministic, i.e., $r\approx 0$.

You can do a more formal argument using the definition of the Pearson correlation by looking at the covariation between $x_3$ and $x_1$. You will see that the covariation will be reduced. By how much, depends on the correlation between $x_1$ and $x_2$, i.e., $r_{12}$ and their standard deviations. Everything being equal, the larger $r_{12}$, the smaller $r_{13}$.

Gkhan Cebs
  • 506
  • 2
  • 6
2

You can rewrite your equation $x_3=x_2-x_1$ as $x_2=x_3-x_1$. Then regardless of what you pick as $x_1$ and $x_3$, you will have that $x_2$ is correlated to $x_1$ and $x_3$, but there is no reason to expect $x_1$ and $x_3$ to be correlated to each other. For instance, if $x_1$= number of letters in title of Best Picture Oscar winner, $x_3$= number of named hurricanes, $x_2$= number of named hurricanes - number of letters in title of Best Picture Oscar winner, then you will have that $x_3=x_2-x_1$, but that doesn't mean that $x_3$ will be correlated with $x_1$.

Acccumulation
  • 3,688
  • 5
  • 11
2

Let $Var(X_1) = \sigma_1^2$, $Var(X_2) = \sigma_2^2$, and $Cov(X_1,X_2)=\sigma_{12} = \rho\sigma_1\sigma_2$ Then $Var(X_3=X_1-X_2)=\sigma_1^2+\sigma_2^2 - 2\sigma_{12}$

$Cov(X_1,X_3)=\sigma_1^2-\sigma_{12}$

$Cov(X_2,X_3) =\sigma_{12}-\sigma_2^2$

$Corr(X_1,X_3) =\frac{\sigma_1^2-\sigma_{12}}{\sqrt{\sigma_1^2(\sigma_1^2+\sigma_2^2 - 2\sigma_{12})}}$

$Corr(X_2,X_3) =\frac{-\sigma_2^2+\sigma_{12}}{\sqrt{\sigma_2^2(\sigma_1^2+\sigma_2^2 - 2\sigma_{12})}}$

So $|Corr(X_1,X_3)| \lt \text {or} = \text {or} \gt |Corr(X_2,X_3)|$ depends on $\sigma_1^2$ and $\sigma_2^2$

This relation cannot be determined by correlation coefficient.

user158565
  • 7,032
  • 2
  • 9
  • 19