1

I am working on a project where I am trying to replicate a randomized experiment from an observational study data, using Mahalanobis distance matching to ensure that the control and treated groups are similar. I have seen several websites talking about Mahalanobis distance as the distance between a point and a distribution. However, from what I understand from this academic paper (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2943670/), and this website (http://mccormickml.com/2014/07/22/mahalanobis-distance/), the Mahalanobis distance can also be calculated between two points (each point representing a vector of features).

So that is what I am trying to do as some sources say Mahalanobis distance is a better measure than the Euclidean distance. However the Mahalanobis distance involves using the inverse of the covariance matrix of the data-set used. Some of the values in my covariance matrix turn out to be very small (close to 0), and when I try to calculate the inverse of this covariance matrix on Matlab, I seem to get a matrix of inifinity values.

Does anyone know how to tackle this problem? Am I understanding something wrong?

kjetil b halvorsen
  • 63,378
  • 26
  • 142
  • 467
stats_nerd
  • 527
  • 2
  • 14
  • Are you saying some of your _diagonal_ values of the matrix are very small? – ttnphns Oct 29 '18 at 07:53
  • @ttnphns The diagonal values of the covariance matrix are all 1(s), which is probably because I normalized all the values of my vectors prior to that (by subtracting the mean of each feature from the value and then dividing it by the feature standard deviation). But many of the non-diagonal values are very small – stats_nerd Oct 29 '18 at 07:58
  • So, you have correlation matrix. And it is probably [singular](https://stats.stackexchange.com/a/70910/3277) because if multicollinearity. Check it: is the determinant of matrix (very close to) zero? If yes you should try to find out the reason why it is singular. And the very first question here will be "do you have more variables than cases in the data?". – ttnphns Oct 29 '18 at 08:09
  • Yes, the determinant of the matrix is very close to zero (1.0647e-90). I have 42 variables and 300 subjects. I am not quite sure how to figure out why it is singular. Do you have any tips that I could follow to figure out why? Thank you – stats_nerd Oct 29 '18 at 08:55
  • Use a generalized inverse ... – kjetil b halvorsen Jan 28 '21 at 14:45

0 Answers0