How to calculate Bhattacharya distance for singular multivariate normal distributions?

Question

I am applying Bhattacharya distance to multivariate normal distributions

$$D_{B}={1 \over 8}({\boldsymbol \mu }_{1}-{\boldsymbol \mu }_{2})^{T}{\boldsymbol \Sigma }^{{-1}}({\boldsymbol \mu }_{1}-{\boldsymbol \mu }_{2})+{1 \over 2}\ln \,\left({\det {\boldsymbol \Sigma } \over {\sqrt {\det {\boldsymbol \Sigma }_{1}\,\det {\boldsymbol \Sigma }_{2}}}}\right),$$

where ${\boldsymbol \mu }_{i}$ and ${\boldsymbol \Sigma }_{i}$ are the means and covariances of the distributions, and

$${\boldsymbol \Sigma }={{\boldsymbol \Sigma }_{1}+{\boldsymbol \Sigma }_{2} \over 2}.$$

The determinant of the covariance matrix turns out to be approximately $0.$

What is the right way to handle this situation?

Please also see the closely related thread at https://stats.stackexchange.com/questions/106325/jeffries-matusita-distance-for-14-variables/106352#106352. — whuber, Oct 14 '19 at 12:23

whuber · Answer 1 · 2021-11-30T20:25:24.297

This distance will be infinite whenever either of the distributions is singular with respect to the other. This means the only cases where it will not be infinite are where the distributions have a common support, which will necessarily be an affine subspace of $\mathbb{R}^d$ on which both $\mu_1$ and $\mu_2$ lie. When that is a proper subspace, all three determinants $|\Sigma|,$ $|\Sigma_1|,$ and $|\Sigma_2|$ will be zero.

The key is to find this common subspace. It will lie within the span of $\Sigma.$ Writing its Singular Value Decomposition

$$\Sigma = U D V^\prime$$

for orthogonal matrices $U$ and $V$ and diagonal matrix $D$ with positive elements (the singular values),

Identify the singular values that are essentially zero (compared to the others up to the precision of your arithmetic).
Set $U^{*}$ to be the matrix formed by columns of $U$ corresponding to the nonzero singular values, $D^{*}$ to be the diagonal matrix of nonzero singular values, and $V^{*}$ to be the matrix formed by the corresponding columns of $V$.
Change the basis in which all three matrices are represented by forming $$\Sigma_i^{*} = {U^{*}}^\prime \Sigma_i V^{*}.$$
In this basis, $\Sigma = D^{*}$ and so its determinant is the product of the nonzero singular values. Use the determinants of the $\Sigma_i^{*}$ in the Battacharya distance formula.

Of course you need to check that $\mu_1-\mu_2$ lies in the span of $\Sigma.$ That's now easy: all the components of $U^\prime (\mu_1-\mu_2)$ corresponding to the zero singular values must themselves be zero.

To illustrate, here is R code implementing this strategy. First, though, let me show an example of its use where both distributions are singular but, because they have a common support on the line $y=x+1,$ their distance is not infinite:

distance.Battacharya(c(1,0), c(2,1), matrix(1,2,2), matrix(1,2,2))

[1] 0.125

Here, $\Sigma_1 = \Sigma_2 = \pmatrix{1&1\\1&1}.$

distance.Battacharya <- function(mu1, mu2, Sigma1, Sigma2) {
  d <- length(mu1)
  if (d != length(mu2)) return(Inf)
  
  Sigma <- (Sigma1 + Sigma2)/2
  fit <- svd(Sigma)
  i <- zapsmall(fit$d) != 0
  #
  # Compute the log determinant terms.
  #
  logdet <- function(X) determinant(t(fit$u[, i]) %*% X %*% fit$v[,i], log=TRUE)$modulus
  q <- c(sum(log(fit$d[i])), -logdet(Sigma1)/2, -logdet(Sigma2)/2) 
  #
  # Compute the Mahalanobis distance term.
  #
  dmu <- mu1 - mu2
  if(any(t(fit$u[, !i]) %*% dmu != 0)) return(Inf)
  y <- fit$v[,i] %*% diag(1/fit$d[i], sum(i), sum(i)) %*% t(fit$u[, i]) %*% dmu
  
  sum(dmu * y)/8 + sum(q)/2
}

user261429 · Answer 2 · 2019-10-01T23:02:28.363

I have not studied in the field mathematics, but I have worked with some data sets as an engineer.

When the determinant of your co-variance matrix is zero:

1) either the data are located on two parallel lines, or parallel planes, etc. or 2) or they are located on the same spot, or line, or plane, or etc.

Generally, If a determinant of a matrix (not co-variance matrix) is zero, and you multiply that to another data matrix, that singular matrix is going to put all of your data on a parallel line, plane or zero.

I'm not sure if following approach is either true of false: 1) Pre-processing step: Sometimes we add some small random noise to the data before any further analysis; then you cannot have such a co-variance.

2) or you can add very small values to your co-variance matrix elements (especially their diagonal elements). you can consider a very small value for your co-variance matrix elements and nothing should be changed. But as your matrix determinant is almost zero, so when you are multiplying that to another matrix, you have to make sure it is not accompanied with some errors. It depends on how large or small your data values are. Sometimes multiplication of two numbers (one very small, and the other a large number) is typically accompanied with some errors. There are also some ways how to handle that.

How to calculate Bhattacharya distance for singular multivariate normal distributions?

2 Answers2

Related