8

I want to implement the following formula where $C$ is variance-covariance matrix of variables x, y, and z:

$$C = \begin{bmatrix}cov(x,x)&cov(x,y)&cov(x,z)\\cov(y,x)&cov(y,y)&cov(y,z)\\cov(z,x)&cov(z,y)&cov(z,z)\end{bmatrix}$$

$$S = (C)^{-1/2}$$

I understand the diagonal and inverse operations, but I am unclear on the meaning of raising a variance covariance matrix to a negative half power. Thus, my questions

  1. What does it mean to raise a covariance matrix to a negative half power?
  2. What general ideas of linear algebra does this assume?
  3. Is there any nice formula to raise a covariance matrix to a negative half power?
Jun Jang
  • 453
  • 2
  • 13
  • 2
    Can you please give a bit more context? The inverse of the covariance matrix is usually referred as the [information matrix](https://en.wikipedia.org/wiki/Fisher_information) $I$. In addition you are taking the square root of $I$. Usually this is associated with the [Cholesky decomposition](https://en.wikipedia.org/wiki/Cholesky_decomposition) of a matrix that is relevant if you want to solve a system or generate some random numbers... More context will probably allow for a more comprehensive answer. – usεr11852 Aug 22 '18 at 19:08
  • @usεr11852 This is related to finance.. In the paper Orthogonalized Equity Risk Premia and Systematic Risk Decomposition, the authors use orthogonal transformation method attributed to Lowdin (in quantum chemistry) to orthogonalize the original data into a set of uncorrelated data. – Jun Jang Aug 22 '18 at 19:26
  • 1
    OK, this makes more sense. They are doing the opposite of what is described [here](https://stats.stackexchange.com/questions/38856). They invert the covariance matrix to get the precision matrix and then get the square root of it so they can project the data in a space where they are uncorrelated. *Quantum chemistry*.... yeah right... Applications of Linear Algebra second semester. :) – usεr11852 Aug 22 '18 at 19:35
  • @usεr11852 Are you familiar with this topic? Can you please help me? I am doing an internship and have to present my summer proejct (which I used the model in tha paper) and I have to explain it in in layman's terms T_T. I would definitely appreciate your help – Jun Jang Aug 22 '18 at 19:38
  • I am no expert but I can more less see what this operation does. This is what I describe in my earlier comment. I will need to leave soon, I might try answering this tomorrow night (BST time). I think the first thing is to check how can we generated correlated number using a variance covariance matrix and then realise that they just do the same but with the inverse of it so they "decorrelate" the numbers. – usεr11852 Aug 22 '18 at 20:01
  • @usεr11852 okay, I am looking forward to hearing back from you soon..! thank you! – Jun Jang Aug 22 '18 at 20:03
  • No problem. Please see my answer below. – usεr11852 Aug 23 '18 at 18:36

2 Answers2

6

What does it mean to raise a covariance matrix to a negative half power?

Usually this notation is used when a matrix is raised to a negative integer power. In this case, the notation is a shortform for taking an inverse and raising to a power. I.e. $A^{-2} = (A^2)^{-1} = (A^{-1})^2$. See this question.

This can be extended to fractional powers. The first question is: What does $A^{1/2}$ mean? Well, if $B$ is the square root of a matrix $A$, that means that $BB = A$.

This question proves that the square root of the inverse is equal to the inverse of the square root, so it makes sense to define $A^{-1/2} = (A^{1/2})^{-1} = (A^{-1})^{1/2}$.

What general ideas of linear algebra does this assume?

If a matrix is positive semi-definite, it has a real square root (and a unique positive semi-definite square root). Fortunately for your purposes, any valid covariance matrix is positive semi-definite.

Is there any nice formula to raise a covariance matrix to a negative half power?

Using the definition, you should combine methods for finding inverses and finding matrix square roots. (I.e. find the inverse, then find the square root.) These problems are well-known and you shouldn't have a problem finding algorithms.

mb7744
  • 361
  • 2
  • 10
  • 1
    I wonder about your claim of uniqueness, because I noticed that although $\pmatrix{0&0\\0&0}$ is positive semi-definite, we have $$\pmatrix{0&1\\0&0}^2=\pmatrix{0&0\\0&0}=\pmatrix{0&0\\1&0}^2.$$ – whuber Aug 22 '18 at 15:35
  • 1
    Woops, I should have said a unique *positive semi-definite* square root. Thanks – mb7744 Aug 22 '18 at 16:03
  • I think perhaps you meant to specify that the square root is *symmetric.* – whuber Aug 22 '18 at 16:53
  • Positive definite matrices are symmetric, yes. – mb7744 Aug 22 '18 at 17:03
  • It means you want to find the matrix $S$ with the property that $S S = C^{-1}$. This is useful for example when a random vector $V=(x,y,z)^T$ has covariance matrix $C$ and you want to create another vector say $V^*=(x^*,y^*,z^*)^T$ with identity covariance matrix: $V^* = SV$. – papgeo Aug 22 '18 at 17:28
  • You have to be careful, because not everybody assumes p-d matrices must be symmetric: see https://en.wikipedia.org/wiki/Positive-definite_matrix#Extension_for_non-symmetric_matrices. – whuber Aug 22 '18 at 17:32
  • 2
    Sure. This is not material to the question asked. – mb7744 Aug 22 '18 at 20:53
  • 1
    Clarity in expression is *always* material. – whuber Aug 23 '18 at 19:09
6

What the operation $C^{-\frac{1}{2}}$ refers at is the decorrelation of the underlying sample to uncorrelated components; $C^{-\frac{1}{2}}$ is used as whitening matrix. This is natural operation when looking to analyse each column/source of the original data matrix $A$ (having a covariance matrix $C$), through an uncorrelated matrix $Z$. The most common way of implementing such whitening is through the Cholesky decomposition (where we use $C = LL^T$, see this thread for an example with "colouring" a sample) but here we use slightly less uncommon Mahalanobis whitening (where we use $C= C^{0.5} C^{0.5}$). The whole operation in R would go a bit like this:

set.seed(323)
N <- 10000;
p <- 3;
# Define the real C
( C <- base::matrix( data =c(4,2,1,2,3,2,1,2,3), ncol = 3, byrow= TRUE) ) 
# Generate the uncorrelated data (ground truth)
Z <- base::matrix( ncol = 3, rnorm(N*p) ) 
# Estimate the colouring matrix C^0.5
CSqrt <- expm::sqrtm(C)
# "Colour" the data / usually we use Cholesky (LL^T) but using C^0.5 valid too
A <- t( CSqrt %*% t(Z) ) 
# Get the sample estimated C 
( CEst <- round( digits = 2, cov( A )) )
# Estimate the whitening matrix C^-0.5
CEstInv <-  expm::sqrtm(solve(CEst))
# Whiten the data
ZEst <-  t(CEstInv %*% t(A) )
# Check that indeed we have whitened the data 
( round( digits = 1, cov(cbind(ZEst, Z) ) ) )

So to succinctly answer the question raised:

  1. It means that we can decorrelate the sample $A$ that is associated with that covariance matrix $C$ in such way that we get uncorrelated components. This is commonly referred as whitening.
  2. The general Linear Algebra idea it assumes is that a (covariance) matrix can be used as a projection operator (to generate a correlated sample by "colouring") but so does the inverse of it (to decorrelate/"whiten" a sample).
  3. Yes, the easiest way to raise a valid covariance matrix to any power (the negative square root is just a special case) by using the eigen-decomposition of it; $C = V \Lambda V^T$, $V$ being an orthonormal matrix holding the eigenvectors of $C$ and $\Lambda$ being a diagonal matrix holding the eigenvalues. Then we can readily change the diagonal matrix $\Lambda$ as we wish and get the relevant result.

A small code snippet showcasing point 3.

# Get the eigendecomposition of the covariance matrix
myEigDec <- eigen(cov(A))
# Use the eigendecomposition to get the inverse square root
myEigDec$vectors %*% diag( 1/ sqrt( myEigDec$values) ) %*% t(myEigDec$vectors)
# Use the eigendecomposition to get the "negative half power" (same as above)
myEigDec$vectors %*% diag( ( myEigDec$values)^(-0.5) ) %*% t(myEigDec$vectors)
# And to confirm by the R library expm
solve(expm::sqrtm(cov(A)))
usεr11852
  • 33,608
  • 2
  • 75
  • 117