3

Is there a general formula for the boundaries of a correlation coefficient given a set of other correlation coefficients? I have seen the formula for three random variables where two correlations are known. For example, given correlations R_ac and R_bc the boundaries of R_ab are given by:

R_ab <= R_ac*R_bc +/- sqrt[(1-R^2_ac)(1-R^2_bc)].

Is there a more general formula? Suppose I have a 4x4 correlation matrix:

---                ---
|  1                 |
|  R_ab 1            |
|  R_ac R_bc 1       |
|  R_ad R_bd R_cd 1  |
---                ---

Is there a boundary formula for R_ab if I know R_ac, R_ad, R_bc, R_bd, and R_cd?

For example, and this is a complete and total guess on my part, something like...

R_ab <= R_ac*R_cd*R*R_bd +/- [(1-R^2_ac)(1-R^2_cd)(1-R_bd)]^(1/3).

Cristian

whuber
  • 281,159
  • 54
  • 637
  • 1,101
Cristian
  • 31
  • 1
  • Answers to the duplicate (although it asks only about three variables) address the general question of $n$ variables. – whuber Sep 28 '17 at 17:32
  • can you direct me to the duplicate so I can read the answer. Thanks. – Cristian Sep 28 '17 at 17:38
  • 1
    Please follow the link at the top of the page following the text "This question already has an answer here:". In looking it over again I see that it does not ask precisely your question (although the answers do reveal the general technique). For the record, then, the thread I was referencing is at https://stats.stackexchange.com/questions/72790/bound-for-the-correlation-of-three-random-variables and I will reopen your question. – whuber Sep 28 '17 at 17:46
  • The answer is not immediately jumping out at me. Could you provide a little more detail. For example, if I can see how the proposed solution is used to derive the solution for when there are two given correlations and one is trying to find the boundaries of a third correlation it may enable me to see how to generalize the solution for 4x4 correlation matrix, 5x5 correlation matrix, and so forth. – Cristian Sep 28 '17 at 19:36
  • In the case of $n=3$ the formula is relatively simple: writing $a$ and $b$ for the other two correlations and $\rho$ for the third, you must have $ab-\sqrt{1-a^2-b^2+a^2b^2}\le\rho\le ab+\sqrt{1-a^2-b^2+a^2b^2}.$ There are dozens of conditions when $n=4$ with known correlations $a,b,c,d,e$. E.g., one is $\rho$ lies between the bounds $\frac{a b e+a c d-b c-d e}{a^2-1}\pm\sqrt{\frac{a^4-2 a^3 b d-2 a^3 c e+a^2 b^2+4 a^2 b c d e+a^2 c^2+a^2 d^2+a^2 e^2-2 a^2-2 a b^2 c e-2 a b c^2 d-2 a b d e^2+2 a b d-2 a c d^2 e+2 a c e+b^2 c^2+b^2 e^2-b^2+c^2 d^2-c^2+d^2 e^2-d^2-e^2+1}{\left(a^2-1\right)^2}}.$ – whuber Sep 28 '17 at 19:50
  • Thank you for the extra detail. I was able to follow what is going on. Unfortunately, the solution does not appear to reduce to a general formula that can be programmed. I was hoping to use the method as a way of explaining why a correlation matrix is singular. That is, to identify coefficients that fall outside the range of acceptable boundaries. However, correlation matrixes can be very large. I have one with 179 variables that is singular. This approach will not work. Fortunately, I can correct it setting negative eigenvalues to zero and then reconstructing the correlation matrix. – Cristian Sep 29 '17 at 14:22
  • Analyzing a singular correlation matrix is almost a completely different problem than constraining the entries! Singularity arises from linear dependencies among the columns. That is precisely what an eigenanalysis (or SVD) can reveal: see https://stats.stackexchange.com/questions/16327. Incidentally, a correlation matrix cannot have negative eigenvalues. When software reports negative values, they have arisen through computational rounding errors, which is whole other set of issues. – whuber Sep 29 '17 at 14:54

1 Answers1

2

A minimal set of constraints is known as Sylvester's Criterion: all the square submatrices anchored at the upper left corner (its "leading principal minors") must have non-negative determinant.

Note that this criterion can be applied in various ways, because you are free to apply any permutation (simultaneously) to the rows and columns. Since such a permutation yields the correlation matrix for the same variables, merely re-ordered, it is a valid correlation matrix if and only if the original matrix was valid.

As an example, consider any $n\times n$ correlation matrix $(\rho_{ij})$ with $n(n-1)/2-1$ given correlations $\rho_{ij}=\rho_{ji}$, $\rho_{ii}=1$, and one unknown correlation. Permute the rows and columns to place that unknown value $\rho$ in the $(n,n-1)$ and $(n-1,n)$ locations, as shown here in red:

$$\pmatrix{1 & \rho_{12} & \cdots & \cdots& \rho_{1n} \\ \rho_{21} & 1 & \cdots & \cdots & \rho_{2n} \\ \vdots & \vdots & \ddots & \vdots & \vdots\\ \rho_{n-1,1} & \rho_{n-2,2} & \cdots & 1 & \color{red}{\rho} \\ \rho_{n1} & \rho_{n2} & \cdots & \color{red}{\rho} & 1 }$$

All the proper leading minors involve known coefficients and therefore, presumably, already have non-negative determinants. It remains to evaluate the determinant of the entire matrix. This is a quadratic form in $\rho$. (This becomes obvious when you recall that the determinant is a sum and difference of products of $n$ terms of the matrix at a time, where no two terms occupy the same row or column. Thus $\rho^2$ can appear as well as $\rho$ times other numbers, but no higher power of $\rho$ can be created.)

Therefore the non-negativity of this determinant (along with the obvious constraint $|\rho|\le 1$) determines either an interval of possible values of $\rho$ or two intervals of the form $[-1,\rho_{-}]$, $[\rho_{+},1]$. The endpoints are easy to compute using the Quadratic Formula.

When more than one correlation is unknown, the answer may depend on the pattern of missing correlations. In any event, it requires solving a system of algebraic inequalities. Thus it seems useless to attempt a general formula: each situation must be resolved on its own. The constraints, being of degree up to $n$, can yield very messy solutions.

whuber
  • 281,159
  • 54
  • 637
  • 1,101