1

I have a data matrix $X$ with shape $p\times n$.

It might not matter but I interpret $X$ is $n$ vectors each containing $p$ features.

Then I compute $Q = X X^{T} / n$. This implies that $Q$ is positive definite.

I interpret $Q$ as covariance matrix of data which are columns of $X$. (Normally mean should be subtracted from $X$ before computing covariance this way. However the mean of $X$ is 0 in my case. That is why the formula for covariance is simpler in my case.)

Then I compute $Q^{-1}$, the inverse of $Q$. Which should be also positive definite.

I want to compute Mahalanobis distance for each column $x$ of $X$ as follows:

$\sqrt{x^T Q^{-1} x}$

However for some columns I get that $x^T Q^{-1} x$ is negative. This should not be possible in theory. So I suspect that the error is caused by numerical errors caused by software.

I use the Math.Net numerics library in C#. How do I avoid this problem?

I tried replacing $Q$ and $Q^{-1}$ by their transpose. The result should be the same in theory but slightly differs in practice. But the difference wasn't big enough I still get negative number for the same column.

What else I can do?

Glen_b
  • 257,508
  • 32
  • 553
  • 939
O.Rerla
  • 11
  • 1
  • Although this doesn't change your question, please note that $Q$ is not the covariance matrix. The covariance matrix is obtained by first removing the means of the columns. One approach you can take to your problem is described at https://stats.stackexchange.com/a/159322/919. (The question of generating random variates from such a degenerate distribution has much in common with the question of computing Mahalanobis distances: basically, you have to project $x$ into the range of $X$ before you attempt to compute the distances.) – whuber Sep 05 '17 at 20:42
  • I'm voting to move this question to stackoverflow – Taylor Sep 05 '17 at 23:19
  • 1
    I don't know if that would help, but how about instead of inverting Q, solve a linear system (Qy=x) where y is your Q^-1 x. Then you just do x^T y.This tends to be more numerically stable. – sega_sai Sep 05 '17 at 23:30
  • Rank-deficiency can make the covariance matrix PSD. Due to numerical instability, it can even become non-positive. So check its eigenvalues. – Firebug Sep 06 '17 at 12:33
  • 1
    See [Why is covariance matrix not positive-definite when number of observations is less than number of dimensions?](https://stats.stackexchange.com/questions/198488/why-is-covariance-matrix-not-positive-definite-when-number-of-observations-is-le?rq=1) (no answers yet) – Firebug Sep 06 '17 at 12:34

0 Answers0