How would a composite variable be strongly correlated with one variable but not the other?

Question

I have two variables x1 and x2 which measure relatively similar things (r ~ 0.6), with x2 slightly larger than x1 on average. I then created a new variable x3 by subtracting the two: x3 = x1 - x2.

However, when I ran the Pearson correlations, x3 is strongly negatively correlated with x2 as expected (r ~ -0.6), but x3 is not very correlated with x1 (r ~ 0.1). How is this possible?

Possible duplicate of [When A and B are positively related variables, can they have opposite effect on their outcome variable C?](https://stats.stackexchange.com/questions/229052/when-a-and-b-are-positively-related-variables-can-they-have-opposite-effect-on) — sds, Dec 11 '18 at 21:07
I have a vague memory of an even closer duplicate but I can not find it. — Sextus Empiricus, Dec 13 '18 at 14:07

Kodiologist · Answer 1 · 2018-12-11T16:54:42.967

Here's a simple example. Suppose $ε_1$ and $ε_2$ are independent standard normal random variables. Define $X_1 = ε_1$, $X_2 = X_1 + ε_2$, and $X_3 = X_1 - X_2$. The correlation of $X_1$ with $X_2$ is then $\tfrac{1}{\sqrt{2}} \approx .71$. Likewise, the correlation of $X_2$ with $X_3$ is $-\tfrac{1}{\sqrt{2}}$. But the correlation of $X_1$ with $X_3$ is the correlation of $ε_1$ with $ε_1 - (ε_1 + ε_2) = -ε_2$, which is 0 since the $ε_i$s are independent.

score 2 · Answer 2 · answered Dec 11 '18 at 15:57

2

This is by construction of $x_3$. Given that $x_2$ and $x_1$ are closely related - in terms of their Pearson correlation if you subtract one from the other, you reduce correlation. The best way to see that is to consider the extreme scenario of complete correlation, i.e., $x_2=x_1$, in which case $x_3=x_1-x_2=0$, which is fully deterministic, i.e., $r\approx 0$.

You can do a more formal argument using the definition of the Pearson correlation by looking at the covariation between $x_3$ and $x_1$. You will see that the covariation will be reduced. By how much, depends on the correlation between $x_1$ and $x_2$, i.e., $r_{12}$ and their standard deviations. Everything being equal, the larger $r_{12}$, the smaller $r_{13}$.

answered Dec 11 '18 at 15:57

Gkhan Cebs

506
2
6

1

By "covariation", do you mean "covariance"? – Kodiologist Dec 11 '18 at 16:53
@Kodiologist Are the two interchangeable? Or do they mean different things? – Cowthulhu Dec 12 '18 at 14:44
@Cowthulhu "Covariance" has a specific definition in statistics, but I'm not familiar with the word "covariation". – Kodiologist Dec 12 '18 at 14:50
@Kodiologist Gotcha, I had never heard "Covariance" referred to as "Covaration" either, so I was just wondering. Very new though, so don't take that as much of an indicator :). Thanks – Cowthulhu Dec 12 '18 at 14:59

score 2 · Answer 3 · answered Dec 11 '18 at 20:29

You can rewrite your equation $x_3=x_2-x_1$ as $x_2=x_3-x_1$. Then regardless of what you pick as $x_1$ and $x_3$, you will have that $x_2$ is correlated to $x_1$ and $x_3$, but there is no reason to expect $x_1$ and $x_3$ to be correlated to each other. For instance, if $x_1$= number of letters in title of Best Picture Oscar winner, $x_3$= number of named hurricanes, $x_2$= number of named hurricanes - number of letters in title of Best Picture Oscar winner, then you will have that $x_3=x_2-x_1$, but that doesn't mean that $x_3$ will be correlated with $x_1$.

user158565 · Answer 4 · 2018-12-12T02:02:46.210

Let $Var(X_1) = \sigma_1^2$, $Var(X_2) = \sigma_2^2$, and $Cov(X_1,X_2)=\sigma_{12} = \rho\sigma_1\sigma_2$ Then $Var(X_3=X_1-X_2)=\sigma_1^2+\sigma_2^2 - 2\sigma_{12}$

$Cov(X_1,X_3)=\sigma_1^2-\sigma_{12}$

$Cov(X_2,X_3) =\sigma_{12}-\sigma_2^2$

$Corr(X_1,X_3) =\frac{\sigma_1^2-\sigma_{12}}{\sqrt{\sigma_1^2(\sigma_1^2+\sigma_2^2 - 2\sigma_{12})}}$

$Corr(X_2,X_3) =\frac{-\sigma_2^2+\sigma_{12}}{\sqrt{\sigma_2^2(\sigma_1^2+\sigma_2^2 - 2\sigma_{12})}}$

So $|Corr(X_1,X_3)| \lt \text {or} = \text {or} \gt |Corr(X_2,X_3)|$ depends on $\sigma_1^2$ and $\sigma_2^2$

This relation cannot be determined by correlation coefficient.

How would a composite variable be strongly correlated with one variable but not the other?

4 Answers4