14

Let $X$, $Y$, and $Z$ are three random variables. Intuitively, I think that it is impossible to have $Cor(X, Y)=0.99$, $Cor(Y, Z)=0.99$ but $Cor(X, Z)=0$. My intuitive thought is that $X$ and $Z$ are nearly linearly correlated to $Y$. Hence, they are more or less linearly correlated, which makes the last equality impossible.

I pose this question because of the question and the comments (include my comments) here.

In general, as some others point out, I agree that it is possible that for some $\rho>0$ we may have $$Cor(X, Y)=\rho, Cor(Y, Z)=\rho \mbox{ and } Cor(X, Z)=0 \qquad (1).$$

My questions are:

  1. Do you think that (1) is wrong when $\rho$ is close to 1, e.g., 0.99?
  2. If (1) is wrong when $\rho$ is close to 1, what is the maximum value of $\rho$ so that (1) can be correct?
TrungDung
  • 749
  • 4
  • 13
  • Couldn't this be achieved in the following scenario: Let $X$ be a uniform random binary variable, let $Z$ be a uniform random binary variable. Thus $Cor(X,Z) = 0$. Then let $Y = 2X + Z$. Then $Cor(X,Y) = Cor(Y,Z) = 1$ and if you added in some noise you could get that from 1 to 0.99 I think. This at least makes sense to me intuitively. – ryan Nov 25 '20 at 22:07
  • See the similar Q: https://stats.stackexchange.com/questions/131065/non-transitivity-of-correlation-correlations-between-gender-and-brain-size-and/131069#131069 – kjetil b halvorsen Nov 26 '20 at 01:13
  • 1
    @ryan - you may be confusing correlation and covariance. Your example does not lead to $Cor(X,Y) = Cor(Y,Z) = 1$ – Henry Nov 30 '20 at 10:11
  • @Henry, yes I was mistaken. My friend pointed out that the correlation would not be 1 since you cannot determine $Y$ from $X$ alone. Same thing for $Z$. – ryan Dec 01 '20 at 02:29

4 Answers4

16

The correlation matrix needs to be positive semi-definite with non-negative eigenvalues. The eigenvalues of the correlation matrix are the solutions of $$ \left| \begin{matrix} 1-\lambda & \rho & \rho \\ \rho & 1-\lambda & 0 \\ \rho & 0 & 1-\lambda \end{matrix} \right| =(1-\lambda)\big((1-\lambda)^2-2\rho^2)\big) =0 $$ so the eigenvalues are $1$ and $1\pm\sqrt{2}\rho$. These are all non-negative for $$ -\frac1{\sqrt{2}} \le \rho \le \frac1{\sqrt{2}}. $$

Jarle Tufto
  • 7,989
  • 1
  • 20
  • 36
7

A more intuitive perspective (an example) to complement @Jarle Tufto's +1 answer:

What you are asking, is whether something like this variance-covariance matrix is possible:

$\bf{\Sigma} = \begin{matrix} & X & Y & Z \\ X & 1 & 0.9 & 0 \\ Y & 0.9 & 1 & 0.9 \\ Z & 0 & 0.9 & 1\\ \end{matrix}$

This matrix is not positive-semidefinite. In fact, it is indefinite, since its determinant is negative. For example, a multivariate normal vector with this var-cov matrix cannot exist, since its PDF requires the determinant of $\Sigma$. If it is negative, the PDF would become negative, which would lead to negative probabilities. For this not to happen, the condition mentioned by @Jarle Tufto needs to be fulfilled.

$PDF_{Gauss}(x) =(2\pi)^{-0.5k}\det(\Sigma)^{-0.5}e^{-0.5(x-\mu)^T\Sigma^{-1}(x-\mu)}$

PaulG
  • 793
  • 2
  • 10
4

If you performed linear regression on $Y$, you would get an $R^2$ value of at most 1.

In your problem setting:

  • Performing linear regression on $Y$ with $X$ gets $R = 0.99$.
  • Performing linear regression on $Y$ with $Z$ gets $R = 0.99$.
  • $X$ and $Z$ are not correlated, so they would both independently contribute to the $R^2$ value of a regression on $Y$.

Combining these, when you perform linear regression on $Y$ with both $X$ and $Z$, you get $R^2 = (0.99)^2 + (0.99)^2 > 1$, which is impossible. This should also provide some idea of the bounds on these correlation values.

Springo
  • 41
  • 1
1

I posted this previously on Math StackExchange, but will reiterate here. If $\rho_{AB} = \text{Corr}(A, B)$, and similarly defined for $\rho_{BC}$ and $\rho_{AC}$, we have the inequality. \begin{align*} \rho_{AC} \ge \max\{2(\rho_{AB} + \rho_{BC}) - 3, 2\rho_{AB}\rho_{BC} - 1\} \end{align*} Proof. Some notation. I let $\sigma_{AB} = \text{Cov}(A,B)$ and $\sigma_A^2 = \text{Var}(A)$.

Let's first prove $\rho_{AC} \ge 2(\rho_{AB} + \rho_{BC}) - 3$. Recall the identity \begin{align*} 2 E[X^2] + 2E[Y^2] = E[(X+Y)^2] + E[(X-Y)^2] \end{align*} hence $2E[Y^2] \le E[(X+Y)^2] + E[(X-Y)^2]$. Set \begin{align*} X = \widetilde{B} - (\widetilde{A} + \widetilde{C})/2 \quad \text{and} \quad Y = (\widetilde{A} - \widetilde{C})/2 \end{align*} where $\widetilde{C} = (C - E[C])/\sigma_C$, the normalized random variable, and similarly for $\widetilde{A}, \widetilde{B}$. Upon substitution and simplification, we get \begin{align*} \frac{1}{2}(2 - 2\rho_{AC}) \le (2 - 2\rho_{AB}) + (2 - 2\rho_{BC}) \iff \rho_{AC} \ge 2(\rho_{AB} + \rho_{BC}) - 3 \end{align*} To prove $\rho_{AC} \ge 2\rho_{AB}\rho_{BC} - 1$, consider the random variable \begin{align*} W = 2 \frac{\sigma_{AB}}{\sigma_B^2}B - A \end{align*} We can verify $\sigma_W^2 = \sigma_A^2$, and hence $\sigma_{WC} \le \sigma_{W}\sigma_{C} = \sigma_ A \sigma_C$ by the Cauchy-Schwarz inequality. On the other hand, you may compute \begin{align*} \sigma_{WC} = 2 \frac{\sigma_{AB}}{\sigma_B^2}\sigma_{BC} - \sigma_{AC} \end{align*} Reorganizing all this, we prove $\rho_{AC} \ge 2\rho_{AB}\rho_{BC} - 1$.

In your specific example, with $\rho_{AB} = \rho_{BC} = 0.99$, then no matter the construction of $A, B, C$, we must have $\rho_{AC} \ge 0.9602$.

Tom Chen
  • 407
  • 2
  • 12