11

Suppose I have $n$ data points $x_1,\dots,x_n$, each of which is $p$-dimensional. Let $\Sigma$ be the (non-singular) population covariance of these samples. With respect to $\Sigma$, what is the most efficient way known to compute the vector of squared Mahalanobis distances (from $\vec 0$) of the n data points.

That is we would like to compute the vector $(x_1^T\Sigma^{-1}x_1,\dots,x_n^T\Sigma^{-1}x_n)$.

Computing the inverse $\Sigma^{-1}$ seems to be quite slow for large matrices. Is there a faster way?

Lepidopterist
  • 712
  • 5
  • 23

1 Answers1

16
  1. Let $x$ be one of your data points.

  2. Compute the Cholesky decomposition $\Sigma=LL^\top$.

  3. Define $y=L^{-1}x$.

  4. Compute $y$ by forward-substitution in $Ly=x$.

  5. The Mahalanobis distance to the origin is the squared euclidean norm of $y$:

$$ \begin{align} x^\top\Sigma^{-1}x &= x^\top(LL^\top)^{-1}x \\ &= x^\top(L^\top)^{-1}L^{-1}x \\ &= x^\top(L^{-1})^\top L^{-1}x \\ &= (L^{-1}x)^\top(L^{-1}x) \\ &= \|y\|^2. \end{align} $$

Zen
  • 21,786
  • 3
  • 72
  • 114
  • I had an issue with the accuracy of this proposed algorithm and created a new question: http://stats.stackexchange.com/questions/147654/cholesky-factorization-and-forward-substitution-less-accurate-than-inversion – Lepidopterist Apr 22 '15 at 01:01
  • Without wanting to take away anything from how insight full this answer is (I up-voted earlier); especially because the OP (@Lepidopteris) uses MATLAB I would argue that for *implementation purposes *only* one should use `x'* Sigma \ x` where effectively `Sigma \ x` will take care of the `inv(A) * x` (please never use this later command). – usεr11852 Apr 22 '15 at 02:06
  • Why do you say for implementation purposes one should use x'* Sigma \ x? Why is chol not a good option in MATLAB? – Lepidopterist Apr 22 '15 at 02:47
  • @Lepidopterist: I have answered this comment in the comments of the new question. – usεr11852 May 07 '15 at 01:56