Numerical Instability of calculating inverse covariance matrix

Question

I have a 65 samples of 21-dimensional data (pasted here) and I am constructing the covariance matrix from it. When computed in C++ I get the covariance matrix pasted here. And when computed in matlab from the data (as shown below) I get the covariance matrix pasted here

Matlab code for computing cov from data:

data = csvread('path/to/data');
matlab_cov = cov(data);

As you can see the difference in covariance matrices are minute (~e-07), which is probably due to numerical problems in the compiler using floating point arithmetic.

However, when I compute the pseudo-inverse covariance matrix from the covariance matrix produced by matlab and the one produced by my C++ code, I get widely different results. I am computing them in the same way i.e.:

data = csvread('path/to/data');
matlab_cov = cov(data);
my_cov = csvread('path/to/cov_file');
matlab_inv = pinv(matlab_cov);
my_inv = pinv(my_cov);

The difference is so huge that when I am computing the mahalanobis distance from a sample (pasted here) to the distribution of the 65 samples by:

$(65/64^2) \times ((sample-mean)\times {\sum}^{-1} \times (sample-mean)')$

using the different inverse covariance matrices (${\sum}^{-1}$) I get widely different results i.e.:

 (65/(64^2))*((sample-sample_mean)*my_inv*(sample-sample_mean)')
ans =

   1.0167e+05

(65/(64^2))*((sample-sample_mean)*matlab_inv*(sample-sample_mean)')
ans =

  109.9612

Is it normal for the small (e-7) differences in covariance matrix to have such an effect on the computation of the pseudo-inverse matrix? And if so, what can I do to mitigate this effect?

Failing this, are there any other distance metrics I can use that do not involve the inverse covariance? I use the Mahalanobis distance as we know for n samples it follows a beta distribution, which I use for hypothesis testing

Many thanks in advance

EDIT: Adding C++ code for calculating covariance matrix below: The vector<vector<double> > represents the collection of rows from the file pasted.

Mat covariance_matrix = Mat(21, 21, CV_32FC1, cv::Scalar(0));
    for(int j = 0; j < 21; j++){
        for(int k = 0; k < 21; k++){
            for(std::vector<vector<double> >::iterator it = data.begin(); it!= data.end(); it++){
                covariance_matrix.at<float>(j,k) += (it->at(j) - mean.at(j)) * (it->at(k) - mean[k]);
            }
            covariance_matrix.at<float>(j,k) /= 64; 
        }
    }

Inverting matrixes..... That's a dangerous thing! Ususally it's preferable to find alternatives to that (e.g. pseudoinverse) — Ander Biguri, Feb 27 '13 at 14:54
@AnderBiguri indeed I am using the pseudoinverse (see code above I call pinv) — Aly, Feb 27 '13 at 14:59
@AnderBiguri also, are you aware of another suitable distance measure not using the inverse? — Aly, Feb 27 '13 at 15:00
It is indeed normal for rounding error of $10^{-7}$ to have *profound* influences on matrix operations when the [condition number](http://en.wikipedia.org/wiki/Condition_number) of the matrix is huge--in this case it is approximately $10^6$. But why are you using single precision floats and not double precision? — whuber, Feb 27 '13 at 15:07
@whuber Am I using floats? How do I specifically use doubles in Matlab? In C++ I compute it all with doubles, but I am new to Matlab and have just been using it as a verification tool — Aly, Feb 27 '13 at 15:08
How do you get you compute your precision matrix in C++? You should use a good C++ library like Eigen or Armadillo or some BLAS variant. It will almost surely be faster than a "simple" user implementation. — usεr11852, Feb 27 '13 at 15:11
@user11852 I have added the C++ code, I should use something from Eigen, but for now I have a simple user implementation — Aly, Feb 27 '13 at 15:14
@whuber How can one lower the condition number of the covariance matrix? — Aly, Feb 27 '13 at 15:14
@Aly The condicion number comes with the values of the matrix, it is a number to evaluate how well/ill-conditioned (how variable is to small changes) is your matrix. Different data would have different condition number, but it is nothing you can change without changing the data. Also sorry, I didn't precissely check your code for the pseudoinv — Ander Biguri, Feb 27 '13 at 15:20
@AnderBiguri My 21-dimensional vectors are normalized to 1, so all values are <=1. This produces a covariance matrix with small numbers. If this was scaled by 100, for example, could this improve the condition number? — Aly, Feb 27 '13 at 15:22
@Aly: the matrices you are looking to invert are not "valid" covariances matrices because they are not positive definite; numerically they even have some eigenvalues that are negative (but close to zero). I would probably just add some very small constant along the diagonal; it is a form of Tikhonov correction really ($Χ + \lambda I$). Also don't use floats, use doubles to store your covariance matrix. (And besides you already use OpenCV, you might as well use Eigen or Armadillo..) — usεr11852, Feb 27 '13 at 15:28
You can lower the condition number (1) possibly by not rounding or rounding less aggressively. (2) By zeroing out really small singular values (via singular value decomposition, or SVD). BTW, perhaps you're not *calculating* with floats--but by recording your data only to seven decimal places in your files, you are effectively rounding them to single-precision values. Rescaling will do nothing to the condition number. — whuber, Feb 27 '13 at 15:28
@user11852 I find no negative eigenvalues--you might be looking at rounding error yourself. — whuber, Feb 27 '13 at 15:29
@whuber: I didn't calculate anything; C-Ping the matlab_cov_mat in Matlab and calculating the eigenvalues gives you a negative 21st eigenvalue ( -0.000000016313723 ) Both the covariance matrices provided don't have a Cholesky decomposition so they are not positive-definite. — usεr11852, Feb 27 '13 at 15:33
@whuber I am storing as floats (as can be seen in the C++ code) I will switch to Eigen to use doubles. — Aly, Feb 27 '13 at 15:33
@user11852 I will attempt either SVD to make the cov matrix positive definite and also the Tikhonov Correction and see if they improve things — Aly, Feb 27 '13 at 15:35
@Aly: in Eigen declare your matrix as "MatrixXd" to use doubles. Also, given you do the covariance matrix positive definite with some form of correction, there is no reason for you to use the generalized inverse. "Normal" inverse routines are faster. (There is already this functionality in Eigen, as soon as you get your covariance matrix K, you can get the inverse by using "K.inv()".) — usεr11852, Feb 27 '13 at 15:41
@user11852 This is after I have made the covariance matrix positive definite? On that note, do you have any references about how to use SVD or the Tikhonov Correction to make the cov matrix positive definite? — Aly, Feb 27 '13 at 15:46
@Aly: Wikipedia, really. (it is the lemma : Tikhonov Regularization). The method that whuber mentioned using the SVD will give you a non-negative definite matrix if you set small eigenvalues to zero; you will still need to add some small constant to all your eigenvalues to make them positive definite. Practically both methods do the same. Just I resorted in not using the SVD but directly affect the samples eigenvalues by adding $\lambda$ to all of them. I haven't come across any references, both methods are quite intuitive I believe. — usεr11852, Feb 27 '13 at 15:57
@user11852 Isn't using the SVD approach more stable though? For example, if you have to guess some ${\lambda}$ then for a particular matrix $C + {\lambda}I$ may still produce negative eigenvalues? — Aly, Feb 27 '13 at 15:59
@user11852 Please can you make your comments an answer, I am still experimenting, but if promising will accept. Also, if others make their suggestions answers I will up vote as they have been very helpful/useful to my understanding of the problem — Aly, Feb 27 '13 at 16:22
I commented in your other [thread](http://stats.stackexchange.com/questions/50949/why-use-the-mahalanobis-distance) that having **variables that sum to 1**, as you data set does, encourages instability and contains a redundant variable. Please try dropping one column. You do not even need the pinv: the covariance matrix is no longer singular. — Cam.Davidson.Pilon, Feb 27 '13 at 17:58
@Cam.Davidson.Pilon When you say the covariance matrix is no longer singular, why is this? Surely this will not arise by me just dropping one column from my vector? — Aly, Feb 27 '13 at 18:16
@Aly It is true. To quote from my prev. comment: Consider a 2-d case, where data is of the form (x, y) where y = 1-x. The variables are perfectly correlated, hence of course the covariance matrix will be ill-formed as it looks like [ 1, 1; 1, 1]. Basically you have a redundant variable (back to my example, if you know x, you definetely know y). Try dropping a single variable. — Cam.Davidson.Pilon, Feb 27 '13 at 18:20
With 21 dimensions, you should try to get much more samples. This is barely enough to define a covariance matrix at all, but no way enough to make the results statistically meaningful. So either, drop some columns, or get more samples. — Has QUIT--Anony-Mousse, Feb 27 '13 at 21:38

usεr11852 · Accepted Answer · 2013-02-28T08:44:53.203

The matrices you are looking to invert are not "valid" covariances matrices because they are not positive definite; numerically they even have some eigenvalues that are negative (but close to zero). This most probably due to machine zeros, for example the last eigenvalue of your "matlab_covariance" matrix is -0.000000016313723. To correct to positive definite you can do two things :

Just add some very small constant along the diagonal; a form of Tikhonov correction really ($Χ+\lambda I$ with $\lambda \rightarrow 0$).
(Based on what whuber proposed) Use SVD, set the "problematic" eigenvalues to a fixed small value (not zero), reconstruct your covariance matrix and then invert that. Clearly if you set some of those eigenvalues to zero you will end up with a non-negative (or semi-positive) matrix, that will not be invertible still.

A non-negative matrix does not have a inverse but it does have a pseudo inverse (all matrices with real or complex entries have a pseudo-inverse), nevertheless the Moore–Penrose pseudo-inverse is more computationally expensive than a true inverse and if the inverse exists it is equal to the pseudo-inverse. So just go for the inverse :)

Both methods practically try to handle the eigenvalues that evaluate to zero (or below zero). The first method is bit hand-wavy but probably much faster to implement. For something slightly more stable you might want to compute the SVD and then set the $\lambda$ equal to absolute of the smallest eigenvalue (so you get non-negative) plus something very small (so you get positive). Just be careful not to enforce positivity to a matrix that is obviously negative (or already positive). Both methods will alter the conditioning number of your matrix.

In statistical terms what you do by adding that $\lambda$ across the diagonal of your covariance matrix you add noise to your measurements. (Because the diagonal of the covariance matrix is the variance of each point and by adding something to those values you just say "the variance at the points I have readings for is actually a bit bigger than what I thought originally".)

A fast test for the positive-definiteness of a matrix is the existence (or not) of the Cholesky decomposition of it.

Also as a computational note:

Don't use floats, use doubles to store your covariance matrix.
Use numerical linear algebra libraries in C++ (like Eigen or Armadillo) to get inverses of matrices, matrix products, etc. It's faster, safer and more concise.

EDIT: Given you have a Cholesky decomposition of your matrix $K$ such that $LL^T$ (you have to do that to check you are having a Pos.Def. matrix) you should be able to immediately solve the system $Kx=b$. You just solve Ly = b for y by forward substitution, and then L^Tx = y for x by back substitution. (In eigen just use the .solve(x) method of your Cholesky object) Thanks to bnaul and Zen for pointing out that I focused so much on getting the $K$ be Pos.Def. that I forgot why we cared about that in the first place :)

+1. Using *Mathematica* and applying it to the *data* (rather than the posted covariance matrix, which may have been presented with too little precision) I find *no* negative eigenvalues. That is as it should be: when a covariance matrix is computed exactly, it's guaranteed positive semi-definite, so *any* negative eigenvalues must be attributed to imprecision in the calculations. Any decent generalized inverse procedure ought to "recognize" those tiny negative values as zeros and treat them accordingly. — whuber, Feb 27 '13 at 17:52
Thanks guys for the effort, as stated I have up voted and will try these out and either comment or accept accordingly. — Aly, Feb 27 '13 at 18:11
Sorry, I'm a bit confused, how does solving the Cholesky give use the Mahalanobis distance? — Aly, Feb 28 '13 at 14:13
Check the link in original bnaul's post. But don't use $LU$ but Cholesky (that's what they mean by LDL*). — usεr11852, Feb 28 '13 at 14:28

score 3 · Answer 2 · edited May 23 '17 at 12:39

3

The posted answers and comments all make good points about the dangers of inverting nearly singular matrices. However, as far as I can tell, no one has mentioned that computing Mahalanobis distance doesn't actually require inverting the sample covariance. See this StackOverflow question for a description of how to do so using the $LU$ decomposition.

The principle is the same as solving a linear system: when trying to solve for $x$ such that $Ax=b$, there are much more efficient and numerically stable methods than taking $x=A^{-1}b$.

Edit: it probably goes without saying, but this method produces the exact distance value, whereas adding $\lambda I$ to $S$ and inverting yields only an approximation.

edited May 23 '17 at 12:39

Community

1

answered Feb 28 '13 at 03:29

bnaul

3,011
1
17
16

2

You're right, @bnaul. However, without some kind of regularization, the `LU` decomposition will not work either. I will add a comment about this in my answer. – Zen Feb 28 '13 at 04:24
@bnaul: why do the LU when you do to the Cholesky to check positive definiteness? Assuming you have a valid covariance matrix $K=LL^T$ solving $Ly = b$ for y by forward substitution, and then $L^Tx = y$ for x by back substitution will be faster. Good point though, I definitely focus on get positive-definiteness that I forgot why I cared about it originally! :D – usεr11852 Feb 28 '13 at 08:46

score 0 · Answer 3 · answered Apr 04 '18 at 14:20

(Years later) a tiny example: with $A$ rank-deficient, $r < n, \ n - r$ eigenvalues of $A^T A$ will be 0 to within machine precision -- and about half of these "zeros" may be $< 0$ :

#!/usr/bin/env python2
""" many eigenvalues of A'A are tiny but < 0 """
# e.g. A 1 x 10: [-1.4e-15 -6.3e-17 -4e-17 -2.7e-19 -8.8e-21  1e-18 1.5e-17 5.3e-17 1.4e-15  7.7]

from __future__ import division
import numpy as np
from numpy.linalg import eigvalsh  # -> lapack_lite
# from scipy.linalg import eigvalsh  # ~ same
from scipy import __version__

np.set_printoptions( threshold=20, edgeitems=10, linewidth=140,
        formatter = dict( float = lambda x: "%.2g" % x ))  # float arrays %.2g
print "versions: numpy %s  scipy %s \n" % (
        np.__version__, __version__  )

np.random.seed( 3 )

rank = 1
n = 10
A = np.random.normal( size=(rank, n) )
print "A: \n", A
AA = A.T.dot(A)
evals = eigvalsh( AA )
print "eigenvalues of A'A:", evals

Numerical Instability of calculating inverse covariance matrix

3 Answers3