1

Given: Two continuous multivariate probability distributions, expressed as mixture models (possibly, but not necessarily, Gaussian Mixture Models)

Desired Output: The Hellinger Distance between the two probability distributions. The probability distributions then to be used to generate data streams.

Constraints: I would like to code this in Java, for implementation with the MOA framework. With Java it is possible to import the Apache Commons Math classes, but if need be, I can implement this in another language and save the output as ARFF files.

What I know so far: The Hellinger distance expresses the similarity between two probability distributions. From Wikipedia I know that if we define the Hellinger distance in terms of elementary probability theory, we simply have two probability density functions:

If we denote the densities as $f$ and $g$, respectively, the squared Hellinger distance can be expressed as a standard calculus integral: $H^2(f,g)=\frac{1}{2}\int(\sqrt{f(x)}-\sqrt{g(x)})^2dx$.

This definition is the most interpretable to me that I have found and seems like it will lend itself to both the construction of the mixture models by a computer and the calculation of the Hellinger distance by a computer.

Questions:

  1. Is there a way to express this form of the Hellinger Distance that lends itself to easy (or at least natural) computation by a computer?

  2. Is there another way of framing the calculation that would be more natural?

  3. Can all of this be sidestepped and the Hellinger Distance approximated using sampling?

user77876
  • 886
  • 6
  • 19
  • What is the dimensionality of your densities? Any reason you cannot just use numerical integration? – kjetil b halvorsen Aug 01 '17 at 15:19
  • @kjetilbhalvorsen Ideally my solution will work for data of arbitrary dimensionality ($d >= 2$). That being said, it will be sufficient for my purposes to have a way forward addressing the case where $d=2$. – user77876 Aug 02 '17 at 13:45
  • @kjetilbhalvorsen Part II. It has been a while since I did my Scientific Computing course, but a numerical integration is what I am looking for. I am thinking now, for example, that I could evaluate the integral using something like [Monte Carlo integration](http://www.cafemath.fr/mathblog/article.php?page=MonteCarlo.php)? – user77876 Aug 02 '17 at 13:51
  • I don't think there's a general closed form for Hellinger between mixtures. If you want to estimate it based on samples, there are [several approaches](https://stats.stackexchange.com/questions/29616/is-there-an-unbiased-estimator-of-the-hellinger-distance-between-two-distributio/332029#comment628683_40693). – Danica Mar 06 '18 at 17:30

0 Answers0