Sample Covariance Matrix Computation

Question

The covariance matrix has the property that it is positive semi definite. Occasionally when calculating the sample covariance matrix this is not the case. What can be done in these cases?

Many thanks.

(My specific problem is that I have 2 stock prices containing 10 years of data. I wish to calculate the sample covariance matrix for the two stocks for a given window of data, e.g. 20 days, guaranteeing that it is always positive semi definite)

Chris, the sample covariance matrix is always positive semi-definite. (It is the covariance of the empirical distribution, *QED*.) Are you perhaps computing a matrix of *pairwise* covariances when some of the data are missing? — whuber, Jan 23 '17 at 15:50

score 1 · Answer 1 · answered May 16 '19 at 22:14

In comments you gave the extra information that you have some missing data, and the covariance matrix is computed only by using the available (non-missing) pairs. That way there is no guarantee of positive definiteness. So you can do:

Use (multiple) imputation on the data before calculating the covariance matrix.
Do as now, but then compute the closest posdef (positive definite) matrix, and use that as the estimated covariance matrix. See Closest Non-negative matrix.

Sample Covariance Matrix Computation

1 Answers1