why the diagonal elements of the inverted correlation matrix is related to correlation with all other variables

Question

For an inverted correlation matrix $C^{-1}$, I read that its diagonal elements are related to the multiple correlation between measure i as a criterion predicted from all other measures in the set, as follows: $$ R_{i,12...p} = \sqrt{1 - \frac{1}{C^{-1}_{ii}}} $$

Why is that?

There is a related question Why does inversion of a covariance matrix yield partial correlations between random variables?. But that is about the off-diagonal elements.

mpr · Accepted Answer · 2016-08-26T16:39:06.717

Using the block-inverse formula, if we write the correlation matrix as $$M = \left[\begin{matrix}A & B\\B^t & D \end{matrix}\right] $$ then the bottom right block of the inverse correlation matrix will be $$(D-B^tA^{-1}B)^{-1} $$

Now assume that we break the correlation matrix into blocks of size $n-1$ and $1$, so that $D$ is a $1\times1$ matrix containing the entry $M_{nn}=Cor(X_n,X_n)=1$. In this case, we get \begin{align*} M^{-1}_{nn}&=\frac{1}{1-B^tA^{-1}B}\\ 1-\frac{1}{M^{-1}_{nn}}&=B^tA^{-1}B. \end{align*}

Next, assume WLOG (see note below) that the variables involved all have variance 1 and mean 0, so the correlation matrix is also the covariance matrix. Then $A$ is the covariance matrix for $X_{1..(n-1)}$, and $B$ is the vector of covariances between $X_{1..(n-1)}$ and $X_n$.

It follows that the regression coefficients for $X_n$ given $X_1..X_{n-1}$ are $\beta=A^{-1}B$ and therefore, letting $\hat X_n=X_{1..(n-1)}\beta$ denote the least-squares fit of $X_n$ given $X_1..X_{n-1}$, we get \begin{align*} 1-\frac{1}{M^{-1}_{nn}} =B^tA^{-1}B = (A^{-1}B)^tA(A^{-1}B) &= \beta^tA\beta\\ &= Var(\hat{X_n})\\ &= Cov(\hat{X_n},X_n). \end{align*}

Since $Var(X_n)=1$ by assumption, it follows that $$R=Cor(\hat{X_n},X_n)=\frac{Cov(\hat{X_n},X_n)}{\sqrt{Var(\hat{X_n})}}=\sqrt{1-\frac{1}{M^{-1}_{nn}}}$$

Note: as @MarkStone points out, WLOG means "without loss of generality." In this case, the assumption of mean 0 and variance 1 is without loss of generality because we can recenter and scale if necessary, and the rescaling parameters will carry through the calculations and yield the same ultimate result.

Ahh, the old WLOG operator. You might want to explain that WLOG means "without loss of generality". It is sometimes used in dubious ways. In recognition of this abuse by professors and students alike, some 3 1/2 decades ago, I coined the term "generalized WLOG operator", which either stands for "without loss of generality" or "with loss of generality", and therefore is always valid. — Mark L. Stone, Aug 26 '16 at 16:36
LOL. Yes, good point, I'll add a note of that. Though in this case I don't think that assuming mean 0 and variance 1 should be very controversial. — mpr, Aug 26 '16 at 16:37
@mpr, thanks for the answer. As related, why there should be a negative sign for the off-diagonal elements for the partial correlation(see the linked question)? — ahala, Aug 26 '16 at 16:53

why the diagonal elements of the inverted correlation matrix is related to correlation with all other variables

1 Answers1