1

Suppose you have some multivariate normal distribution $X \sim N(\mu,\Sigma)$. Is there a good way to calculate a measure of distance between an arbitrary $X_{i}$ and $X_{j}$ of $X$ (I suppose the simplest approach would be to begin with a two dimensional distribution, which can be obtained through the conditional multivariate normal for any arbitrary large distribution)? Beyond just correlation, I want to be able to take into account the differences in the means and variances as well when thinking about how similar or different they are.

Some approaches I'm familiar with don't seem to be what I'm looking for (Mahalanobis distance can provide a distance for a random sample from the distribution $X$ and the Kullback–Leibler divergence or Bhattacharyya distance would allow you to calculate how different $X$ would be from some other distribution $Y$).

kjetil b halvorsen
  • 63,378
  • 26
  • 142
  • 467
John
  • 2,117
  • 16
  • 24
  • 1
    what wrong with the statistical distance between $X_i$ and $X_j$: $(X_i-X_j)'\Sigma^{-1}(X_i-X_j)$ – user603 May 20 '13 at 22:00
  • 1
    Your answer seems a little conflated. In the sample space, you can measure the distance between two data points using any of a family of norms like the Euclidian norm: $d(X_1, X_2) = \sum( X_1 - X_2 )^2 $. In the functional space, you can measure the distance between their probability functions using KL or Bhatta distances $d(F_{X_1}, F_{X_2}) = \int_{\Omega_X} ||\hat{\mathbb{F}}_{X_1} - \hat{\mathbb{F}}_{X_2} ||$. The functional metrics in statistics use the sample to estimate their CDFs and then calculate their distance. Can you re-explain exactly what you're trying to do? – AdamO May 20 '13 at 22:10
  • I was looking at k means clustering and I wanted to classify some data incorporating correlations and variances so I was looking for a way to measure the distance in this fashion. – John May 20 '13 at 22:18
  • 1
    Then you absolutely want to use the Cholesky decomposition of the covariance matrix and the mean vector to transform your data into independent standard normal data. That way the Euclidean distance for NN prediction makes more sense. – AdamO May 20 '13 at 22:50
  • @AdamO I had considered something similar, but I was worried that if I do cluster analysis on the transformed data, then the original data might be like x% in cluster 1, y% in cluster 2, etc. rather in than in cluster z. – John May 21 '13 at 02:06
  • 2
    As with multidimensional scaling, you don't trust the untransformed distribution of data, so you are shrinking the observation space according to the covariance of the variables. WLOG You believe certain observations on the $X$ side are closer despite being farther in the Euclidian sense from observations on the $Y$ side due to $X$ having smaller variance than $Y$, say $X$, $Y$, are jointly observed independent variables. If you're confident in this method, then you won't be swayed when untransformed variables give slightly unintuitive results. – AdamO May 21 '13 at 17:32
  • Would I be correct that the cholesky transformation would only be a solution for two variables? That's the only way I get variances of 1 and zero correlation. I think if that's the case, then my issue in the previous comment wouldn't apply since I'm only ever calculating the distance between two series. Thanks for the comments. – John May 21 '13 at 18:19

0 Answers0