Conceptually I understand what's going on, some researchers use this as a heuristic for multi-variate outlier detection. I also get that we are measuring the distance of a data point from the mean in a particular direction (the direction of x). However, I find the mahalobis distance equation itself a pretty difficult thing to understand. As pointed out to me, there is already a great topic on this: Bottom to top explanation of the Mahalanobis distance?. I read through it multiple times and my intuition got quite a bit stronger and I'm grateful for it. However, I wanted to revisit this topic with a practical attitude. So if you are still hopelessly lost about the computation of the equation (like me), then maybe this will help.
Given the Mahalanobis Distance equation:
Most of the motivation for all of this is for outliers in high dimensions, but for the sake of simplicity, let's suppose a 2 variable case. Let's say our mu (I guess technically there should be a hat) is:
$$ \mu = \begin{bmatrix} 0\\ 0 \end{bmatrix} $$
and sigma:
$$\Sigma = \begin{bmatrix} 25 & 0\\ 0 & 1 \end{bmatrix}$$
and our data point of interest has the x1, x2 coordinates of: (10,7)
If I'm not mistaken, that should be sufficient for a worked-out example of the equation. I'm kind of on a limb here, but let me try the first step or two.
mu is zero, observed xs minus zero doesn't change anything, so I think the first term is just the transpose.
$$\begin{bmatrix} 10 & 7 \end{bmatrix}$$
Then I beive we take [10,7] and multiply it by the inverse of sigma. As per the comments I have fixed the inverse calculation. Should be:
$$\Sigma ^{-1}=\begin{bmatrix} 1/25 & 0\\ 0 & 1 \end{bmatrix}$$
For my next feat, I will attempt to multiply the (x-μ) vector with the inverse of sigma. I think that's what the equation is imploring us to do anyway, correct me if I'm wrong.
Since mu is zero, we should just have:
$$\begin{bmatrix} 1/25 & 0\\ 0 & 1 \end{bmatrix}\begin{bmatrix} 10\\ 7 \end{bmatrix} = \begin{bmatrix} 10/25\\ 7 \end{bmatrix}$$
And then we take the [10,7] vector from earlier and...
$$\begin{bmatrix} 10 & 7 \end{bmatrix} \begin{bmatrix} 10/25\\ 7 \end{bmatrix} = 100/25 + 49 = 53$$
Lastly,
$$\sqrt{53} \approx 7.3$$
Summary
I believe I have come to the right answer. Seeing the inner workings of the equation and imagining visually the scatter plots in the other Mahalanobis distance post is fascinating. I will still leave the floor open if anyone can put my lengthy matrix algebra into a few lines of python or pseudo-code. Notably, if I wanted a series of the Mahalanobis Distance for all my data points so I can see how they compare to a chi-sq distribution, I suppose we would go rom having a 2x1 vector (like my example above) to a kx1 vector? And would this procedure be repeated n (number of observations) times?