3

I'm trying to visualize the Mahalanobis distance $x^{t} \cdot Σ^{-1} \cdot x$ to get a better intuition. to do so, I assumed some data points from $N(0,1)$ and tried to plot $Σ^{-1} \cdot x$ to understand what I'm projecting $x^{t}$ onto. This is the data points

enter image description here

and this is the points resulted from $Σ^{-1} \cdot x$

enter image description here

now, my intuition was multiplying by $Σ^{-1}$ should remove the correlation between the data so i was expecting to get a circular shape from the second plot. Obviously, I'm Wrong, can someone explain intuitevly what does it mean to multiply $Σ^{-1}$ by an arbitrary point $x$ ?

kk96kk
  • 33
  • 4
  • Perhaps https://stats.stackexchange.com/questions/62092 helps? It addresses your original question about getting intuition for the Mahalanobis distance. – whuber Apr 22 '18 at 20:33

1 Answers1

2

$\Sigma^{-1}$ and x are unit-wise incompatible. You've gone too far correcting for correlation, and gone in the opposite direction.

Consider the upper triangular Cholesky factor, R, of $\Sigma$, such that $R^TR = \Sigma$. To keep things simple, let's assume mean zero. Then $y = R^{-T}x$ will do what you want, producing a standardized multivariate normal, i.e., with covariance matrix = Identity.

$E(yy^T) = E(R^{-T}xx^TR^{-1}) = R^{-T}E(xx^T)R^{-1} = R^{-T}\Sigma R^{-1} = R^{-T}R^TR R^{-1} = I$

Then we have $y^Ty = x^TR^{-1}R^{-T}x = x^T \Sigma^{-1} x$, which is Mahalanobis Distance squared. I.e., Mahalanobis Distance is the length of the standardized random variable $R^{-T}x$.

To help vector your thinking, note that in the one dimensional case, R = standard deviation of x. So dividing by the standard deviation, R, standardizes x. But in the one dimensional case, what you did was divide x by the variance, which doesn't even preserve units.

Mark L. Stone
  • 12,546
  • 1
  • 31
  • 51
  • yes!, thank u! i think i get it now! just one more question to make sure i get it right before accepting your answer, if i made it the other way around, plotted points with no covariance, to transform them to be like the first plot in the question i should multiply each point x with the Cholesky factor R. is that correct? Thanks in advance – kk96kk Apr 23 '18 at 00:26
  • 1
    If x is $N(0,I)$, and R is the Cholesky factor of $\Sigma$, such that $R^TR = \Sigma$, then $y = R^Tx$ will be $N(0,\Sigma)$ because $E(yy^T) = E(R^Txx^TR) = R^TIR = \Sigma$. – Mark L. Stone Apr 23 '18 at 00:40