1

I found this equation here to calculate a covariance matrix of any number of variables using matrix algebra. $$\frac1{N} (X - 1\bar{x})^T(X - 1\bar{x}^T) $$ For a given matrix $X$ with $N$ samples. The following is SAS code I have found in the link above.

ONES = J(N, 1, 1);
meanvec = (1/N)*t(X)*ONES;
mean_matrix = ONES*t(meanvec);
cov_matrix = (1/n) * t(X- mean_matrix) * (x - mean_matrix);

However, I don't have SAS on my workstation so I converted this to R which is nearly identical.

ONES <- matrix(1, nrow=N, ncol=1)
meanvec <- (1/N) * t(X) %*% ONES
mean_matrix <- ONES %*% t(meanvec)
cov_matrix <- (1/N) * t(X - mean_matrix) %*% (X - mean_matrix)

Now, here is where I run in to problems. Let's take this sample matrix $X$

X
     [,1] [,2] [,3]
[1,]   90   60   90
[2,]   90   90   30
[3,]   60   60   60
[4,]   60   60   90
[5,]   30   30   30

If I run the above code I get the following covariance matrix.

cov_matrix
     [,1] [,2] [,3]
[1,]  504  360  180
[2,]  360  360    0
[3,]  180    0  720

But when I run the cov function from the stats package I get

cov(X)
     [,1] [,2] [,3]
[1,]  630  450  225
[2,]  450  450    0
[3,]  225    0  900

which are the pairwise covariances between columns (verified by cov(X[,1], X[,1]). Sorry if I am missing some basic math concept here but what is the difference here? Why would I see 'returns a covariance' matrix from two things that return different 'kinds' of covariance matrices?

This is strictly a learning concept for me so I would appreciate any further information you could provide to help me understand these differences.

kjetil b halvorsen
  • 63,378
  • 26
  • 142
  • 467
cdeterman
  • 4,543
  • 1
  • 20
  • 34
  • 5
    Use $(N-1)$ in place of $N$ to obtain the so-called "unbiased" version – Russ Lenth Jul 16 '15 at 20:01
  • 1
    @rvl, Hah! So it is, that gets the numbers to match. Why would one want either 'unbiased' or 'biased' versions though? I was not aware there was such a distinction. – cdeterman Jul 16 '15 at 20:03
  • See (1) the help page for `cov`; (2) http://stats.stackexchange.com/questions/100041; and (3) http://stats.stackexchange.com/questions/3931 for intuition. For yet more information search [standard deviation correction](http://stats.stackexchange.com/search?q=standard+deviation+correction). – whuber Jul 16 '15 at 20:15
  • Thanks whuber, @rvl if you add your answer below I will accept it. – cdeterman Jul 16 '15 at 20:17
  • A covariance matrix *is* just a matrix of pairwise covariances, so I'm not sure about the distinction you're making. – dsaxton Jul 16 '15 at 20:19
  • 1
    @dsaxton, I was unaware of the $N-1$ component. The formula I had used wasn't for the "unbiased" version which led to the discrepancy. – cdeterman Jul 16 '15 at 20:29

1 Answers1

1

Answered in comments:

A covariance matrix is just a matrix of pairwise covariances, so I'm not sure about the distinction you're making.

– dsaxton

Use $N−1$ in place of $N$ to obtain the so-called "unbiased" version

– rvl

See (1) the help page for cov; (2) How exactly did statisticians agree to using (n-1) as the unbiased estimator for population variance without simulation?; and (3) Intuitive explanation for dividing by $n-1$ when calculating standard deviation? for intuition. For yet more information search standard deviation correction.

– whuber

kjetil b halvorsen
  • 63,378
  • 26
  • 142
  • 467