Mahalanobis Distance in Easy to Understand Pseudo Code

Question

Conceptually I understand what's going on, some researchers use this as a heuristic for multi-variate outlier detection. I also get that we are measuring the distance of a data point from the mean in a particular direction (the direction of x). However, I find the mahalobis distance equation itself a pretty difficult thing to understand. As pointed out to me, there is already a great topic on this: Bottom to top explanation of the Mahalanobis distance?. I read through it multiple times and my intuition got quite a bit stronger and I'm grateful for it. However, I wanted to revisit this topic with a practical attitude. So if you are still hopelessly lost about the computation of the equation (like me), then maybe this will help.

Given the Mahalanobis Distance equation:

Most of the motivation for all of this is for outliers in high dimensions, but for the sake of simplicity, let's suppose a 2 variable case. Let's say our mu (I guess technically there should be a hat) is:

$$ \mu = \begin{bmatrix} 0\\ 0 \end{bmatrix} $$

and sigma:

$$\Sigma = \begin{bmatrix} 25 & 0\\ 0 & 1 \end{bmatrix}$$

and our data point of interest has the x1, x2 coordinates of: (10,7)

If I'm not mistaken, that should be sufficient for a worked-out example of the equation. I'm kind of on a limb here, but let me try the first step or two.

mu is zero, observed xs minus zero doesn't change anything, so I think the first term is just the transpose.

$$\begin{bmatrix} 10 & 7 \end{bmatrix}$$

Then I beive we take [10,7] and multiply it by the inverse of sigma. As per the comments I have fixed the inverse calculation. Should be:

$$\Sigma ^{-1}=\begin{bmatrix} 1/25 & 0\\ 0 & 1 \end{bmatrix}$$

For my next feat, I will attempt to multiply the (x-μ) vector with the inverse of sigma. I think that's what the equation is imploring us to do anyway, correct me if I'm wrong.

Since mu is zero, we should just have:

$$\begin{bmatrix} 1/25 & 0\\ 0 & 1 \end{bmatrix}\begin{bmatrix} 10\\ 7 \end{bmatrix} = \begin{bmatrix} 10/25\\ 7 \end{bmatrix}$$

And then we take the [10,7] vector from earlier and...

$$\begin{bmatrix} 10 & 7 \end{bmatrix} \begin{bmatrix} 10/25\\ 7 \end{bmatrix} = 100/25 + 49 = 53$$

Lastly,

$$\sqrt{53} \approx 7.3$$

Summary

I believe I have come to the right answer. Seeing the inner workings of the equation and imagining visually the scatter plots in the other Mahalanobis distance post is fascinating. I will still leave the floor open if anyone can put my lengthy matrix algebra into a few lines of python or pseudo-code. Notably, if I wanted a series of the Mahalanobis Distance for all my data points so I can see how they compare to a chi-sq distribution, I suppose we would go rom having a 2x1 vector (like my example above) to a kx1 vector? And would this procedure be repeated n (number of observations) times?

In this formula, x and mu can be vectors, such as of length 5 that you requested. — ttnphns, Oct 21 '17 at 08:33
As for the $\Sigma$ being in upper case, it's just that it is a matrix: the variance-covariance matrix. The $(\cdot)^\top$ stands for transposed. In Wikipedia there is a good intuition explanation. If you haven't studied linear algebra, check the free course on-line by Prof. Strang. In a nutshell, just think of it as regular algebra with clunky, boxy packages of numbers that you have to place sideways (transpose) and reorder to make them fit. — Antoni Parellada, Oct 21 '17 at 13:32
The concept of Mahalanobis distance is explained at https://stats.stackexchange.com/questions/62092. There are pitfalls in the numerical computation of this expression that cannot even be explained until you know how to read it. Your chances of creating code that works correctly and reliably are slim until you carry out the advice by @Antoni or unless you use functions that have already been created to perform these matrix operations. — whuber, Oct 21 '17 at 18:18
@whuber Excellent recommendation, that topic helped a lot. My fundamentals are still not quite there, but I feel I'm really close now. I have updated the post with a clearer example. — Arash Howaida, Oct 25 '17 at 07:32
the inverse of a diagonal matrix has the inverses on its diagonal so 1/25 and 1/1 in stead of 1/24 and 25/24. — , Oct 25 '17 at 08:37
You might find it useful to construct a scatterplot of points whose covariance matrix is your $\Sigma$, and then go through the process of transforming that scatterplot into one with a more circular shape. That will give you a geometric--and perhaps intuitive--sense of what $\Sigma^{-1}$ is doing and how it is used to compute Mahalanobis distances. — whuber, Oct 25 '17 at 15:11
This is the function in R $$\begin{align} &\\&\text{> mahalanobis }\\ &\text{function (x, center, cov, inverted = FALSE, ...)} \\ &\text{\{ }\\ &\quad\text{x — Sextus Empiricus, Oct 25 '17 at 18:44

Mahalanobis Distance in Easy to Understand Pseudo Code

0 Answers0