1

Mahalanobis distance provides a value that might be used for the detection of outliers. My question: how to calculate the direction of the outlier (as a vector)?

A simple answer would be to use the distance between the center of the distribution and the outlier but this answer would not use the "normalization" property provided by the Mahalanobis distance...

Gideon Kogan
  • 250
  • 1
  • 10
  • 1
    Hello, I am not sure to understand what you mean by "the direction of an outlier". I would go with the direction of the observation, taking the center of your data as origin, as you suggested. You could normalize it by the positive definite square root of the variance matrix of your data, but I wonder if this is relevant... – Pohoua Aug 10 '20 at 12:52
  • 1
    The direction, as always, is given by the vector going from the center to the outlier. What more might you be looking for?? – whuber Aug 10 '20 at 16:47
  • @whuber, it seems like the solution should be related to mapping of the outlier vector by multiplication with the inverse covariance matrix and then the measurement of the angle. Problem is that multiplication with the inverse covariance matrix is similar to division by variance rather than division by std, which seems to be more appropriate in this case – Gideon Kogan Aug 11 '20 at 08:41
  • You ignore the square root in the formula, Gideon: this turns the variance into the equivalent of an SD. – whuber Aug 11 '20 at 13:28

1 Answers1

-1

Option (1):

You cans use an angle as a direction $$tan\theta = \frac{y_{center}-y_1}{x_{center}-x_1}$$ And a Mahanalobis distance itself as a magnitude of a vector.

Option (2):

To caculate the angle between one of the eigen vectors and the point (outlier):

enter image description here

Michael D
  • 583
  • 1
  • 3
  • 23
  • 1
    Since there are at least two distances in play--the original Euclidean distance and the Mahalanobis distance--could you explain which distance should be used to compute the angle and why? And what exactly do the terms in your formula mean? They look like a *slope* in a 2D problem. The ratio is almost surely not an angle! – whuber Aug 10 '20 at 16:46
  • 1
    Do you mean the $\arctan$ of that ratio? I could see the polar-style coordinate working for data in two dimensions, but what happens in three dimensions or in ten dimensions? – Dave Aug 10 '20 at 17:16
  • 2
    Re the edit: to calculate the angle *in the original metric,* take the arc cosine of the dot product of the unit vector to the point with the unit (directed) eigenvector. To calculate the angle in the Mahalanobis metric, first standardize the point as described at https://stats.stackexchange.com/a/62147/919 and proceed with the preceding recipe. These formulas work in any number of dimensions -- but please note that the angle does not usually give full information about the *direction* requested in the question, which asks for a "vector." – whuber Aug 10 '20 at 21:05
  • @whuber, waht else in addition to angle can contribute to direction? – Michael D Aug 11 '20 at 08:36
  • @whuber, I did not find how to standardize the point in the original coordinates, in the attached link. Seems like it might be something similar to SVD, right? – Gideon Kogan Aug 11 '20 at 08:47
  • @Gideon The link I provided goes to a very detailed, visual explanation of the standardization and ends with an equivalent matrix formula, $\sqrt{(x-y)'C^{-1}(x-y)}.$ Although SVD would do it, this amounts to inverting the covariance matrix, which is much simpler. – whuber Aug 11 '20 at 13:27
  • @whuber, the square root is after the second multiplication when you hold a scalar rather than a vector. Yet, I don't see how you can use the covariance matrix directly to map the relative vector... – Gideon Kogan Aug 11 '20 at 16:37
  • If you follow the graphical explanation at the beginning of my link, it should be obvious: after redrawing the data, simply draw a vector from the new center to the redrawn outlier. The matrix formula is merely doing the equivalent of that. – whuber Aug 11 '20 at 16:54
  • seems like it can be solved with zero-phase component analysis whitening. The process is described in detail here: https://cbrnr.github.io/2018/12/17/whitening-pca-zca/ – Gideon Kogan Aug 13 '20 at 06:17