16

Given three vectors $a$, $b$, and $c$, is it possible that correlations between $a$ and $b$, $a$ and $c$, and $b$ and $c$ are all negative? I.e. is this possible?

\begin{align} \text{corr}(a,b) < 0\\ \text{corr}(a,c) < 0 \\ \text{corr}(b,c) < 0\\ \end{align}

amoeba
  • 93,463
  • 28
  • 275
  • 317
Antti A
  • 261
  • 2
  • 4
  • 3
    Negative correlations mean, geometrically, that the centered vectors mutually make obtuse angles. You should have no problem drawing a configuration of three vectors in the plane that have this property. – whuber Jan 24 '18 at 15:52
  • They cannot be completely negatively correlated ($\rho=-1$), but in general there can be some negative correlation, again bounds set by the other correlations. – karakfa Jan 24 '18 at 20:27
  • @karakfa An interesting question will be, what is the lowest possible correlation that all three pairs can simultaneously have? You might want to add this to your answer below. – amoeba Jan 24 '18 at 20:57
  • 2
    @whuber Your comment seems to contradict Heikki Pulkkinen's answer, which claims it's impossible for vectors in a plane. If you stand by it, you should turn your comment into an answer. – R.M. Jan 24 '18 at 20:58
  • @AnttiA It seems like many people answering seem to be thinking you're specifically interested in 3-vectors (that is, vectors in 3D space). If that's not the case, and you're interested in vectors of arbitrary dimensionality, you might want to edit the post/title to clarify. – R.M. Jan 24 '18 at 21:04
  • @amoeba, added the solution for your interesting follow-up question. – karakfa Jan 24 '18 at 21:09
  • @R.M: take a factor with $m$ levels of the same size. Their dummy variables will all have negative pairwise correlation that gets weaker for growing $m$. – Michael M Jan 24 '18 at 21:15
  • 2
    @R.M. There is no contradiction between whuber and Heikki. This question asks about data matrix $X$ of $n\times 3$ size. Normally we would talk about $n$ data points in 3 dimensions, but this Q is talking about three "vectors" in $n$ dimensions. Heikki says that all negative correlations cannot happen if $n=2$ (indeed, two points after centering are always perfectly correlated, so correlations must be $\pm 1$ and cannot be all $-1$). Whuber says that 3 vectors in $n$ dimensions can effectively lie in a 2-dimensional subspace (i.e. $X$ is rank 2) and suggests to imagine a Mercedes logo. – amoeba Jan 24 '18 at 21:18
  • 1
    Related: [Bound for the correlation of three random variables](https://stats.stackexchange.com/q/72790/). (cc, @amoeba) – gung - Reinstate Monica Jan 24 '18 at 21:19

4 Answers4

19

It is possible if the size of the vector is 3 or larger. For example

\begin{align} a &= (-1, 1, 1)\\ b &= (1, -9, -3)\\ c &= (2, 3, -1)\\ \end{align}

The correlations are \begin{equation} \text{cor}(a,b) = -0.80...\\ \text{cor}(a,c) = -0.27...\\ \text{cor}(b,c) = -0.34... \end{equation}

We can prove that for vectors of size 2 this is not possible: \begin{align} \text{cor}(a,b) &< 0\\[5pt] 2\Big(\sum_i a_i b_i\Big) - \Big(\sum_i a_i\Big)\Big(\sum_i b_i\Big) &< 0\\[5pt] 2(a_1 b_1 + a_2 b_2) - (a_1 + a_2)(b_1 b_2) &< 0\\[5pt] 2(a_1 b_1 + a_2 b_2) - (a_1 + a_2)(b_1 b_2) &< 0\\[5pt] 2(a_1 b_1 + a_2 b_2) - a_1 b_1 + a_1 b_2 + a_2 b_1 + a_2 b_2 &< 0\\[5pt] a_1 b_1 + a_2 b_2 - a_1 b_2 + a_2 b_1 &< 0\\[5pt] a_1 (b_1-b_2) + a_2 (b_2-b_1) &< 0\\[5pt] (a_1-a_2)(b_1-b_2) &< 0 \end{align}

The formula makes sense: if $a_1$ is larger than $a_2$, $b_2$ has to be larger than $b_1$ to make the correlation negative.

Similarly for correlations between (a,c) and (b,c) we get

\begin{equation} (a_1-a_2)(c_1-c_2) < 0\\ (b_1-b_2)(c_1-c_2) < 0\\ \end{equation}

Clearly, all of these three formulas can not hold at the same time.

Heikki Pulkkinen
  • 475
  • 2
  • 13
  • 4
    Another example of something unexpected that only happens in dimension three or higher. – nth Jan 24 '18 at 19:21
  • 2
    With vectors of size $2$, correlations are usually $\pm1$ (straight line through two points), and you cannot have three correlations of $-1$ with three vectors of any size – Henry Jan 25 '18 at 15:14
9

Yes, they can.

Suppose you have a multivariate normal distribution $X\in R^3, X\sim N(0,\Sigma)$. The only restriction on $\Sigma$ is that it has to be positive semi-definite.

So take the following example $\Sigma = \begin{pmatrix} 1 & -0.2 & -0.2 \\ -0.2 & 1 & -0.2 \\ -0.2 & -0.2 & 1 \end{pmatrix} $

Its eigenvalues are all positive (1.2, 1.2, 0.6), and you can create vectors with negative correlation.

Kozolovska
  • 1,027
  • 6
  • 11
7

let's start with a correlation matrix for 3 variables

$\Sigma = \begin{pmatrix} 1 & p & q \\ p & 1 & r \\ q & r & 1 \end{pmatrix} $

non-negative definiteness creates constraints for pairwise correlations $p,q,r$ which can be written as

$$ pqr \ge \frac{p^2+q^2+r^2-1}2 $$

For example, if $p=q=-1$, the values of $r$ is restricted by $2r \ge r^2+1$, which forces $r=1$. On the other hand if $p=q=-\frac12$, $r$ can be within $\frac{2 \pm \sqrt{3}}4$ range.

Answering the interesting follow up question by @amoeba: "what is the lowest possible correlation that all three pairs can simultaneously have?"

Let $p=q=r=x < 0$, Find the smallest root of $2x^3-3x^2+1$, which will give you $-\frac12$. Perhaps not surprising for some.

A stronger argument can be made if one of the correlations, say $r=-1$. From the same equation $-2pq \ge p^2+q^2$, we can deduce that $p=-q$. Therefore if two correlations are $-1$, third one should be $1$.

karakfa
  • 171
  • 4
  • 2
    See https://stats.stackexchange.com/questions/72790/bound-for-the-correlation-of-three-random-variables, *inter alia.* – whuber Jan 24 '18 at 23:13
2

A simple R function to explore this:

f <- function(n,trials = 10000){
  count <- 0
  for(i in 1:trials){
    a <- runif(n)
    b <- runif(n)
    c <- runif(n)
    if(cor(a,b) < 0 & cor(a,c) < 0 & cor(b,c) < 0){
      count <- count + 1
    }
  }
  count/trials
}

As a function of n, f(n) starts at 0, becomes nonzero at n = 3 (with typical values around 0.06), then increases to around 0.11 by n = 15, after which it seems to stabilize:

enter image description here So, not only is it possible to have all three correlations negative, it doesn't seem to be terribly uncommon (at least for uniform distributions).

John Coleman
  • 283
  • 1
  • 7