What is the correct expression of the Hellinger Distance equation?

Question

I am aware there are various ways to calculate the Hellinger Distance (H) depending on the context and data. One of these ways, as I understand, is via the Bhattacharyya coefficient (BC). For discrete distributions, $H=\sqrt{1-BC} $ where $BC=\sum_{i=1}^n \sqrt{p_i q_i} $. Hence we have:

$$H=\sqrt{1-\sum_{i=1}^n \sqrt{p_i q_i}} $$

However, I have found some expressions of the Hellinger Distance equation that includes a factor of 2 (see page 302, here) in the form of:

$$H=2\sqrt{1-\sum_{i=1}^n \sqrt{p_i q_i}} $$ This is equivalent to $H=2\sqrt{1-BC} $ found in a Cross Validated question here.

So which version of the Hellinger Equation is correct? Or am I missing something? A factor of two in a distance measure is hardly a trivial difference.

In fact for most things that are done with distances it is a trivial difference, as distances are usually compared with each other (i.e., Hellinger distances between different pairs of objects) or used in techniques such as cluster analysis or multidimensional scaling which are invariant to multiplying the distance by a constant. In fact I'm struggling to come up with any application of Hellinger distances where it would make a difference whether there's a factor 2 or not. So these different formulae may just be around with the implicit understanding that they're equivalent. — Christian Hennig, Jan 28 '22 at 10:33
@Christian Hennig. Yes, I suspected this was likely to be the case. However, it's all nice and good to treat this difference as some arbitrary scale factor. Yet, surely an equation is an equation, invariant or not. We know the equations for, let's say, entropy, the normal distribution and that 1+1=2. The Hellinger equation shouldn't get so 'free pass' to have an implicit rather than objective understanding, comparative or not. In fact, the comparative nature suggests standardization would be a desirable property. Surely, Stats/Math literature could be more, well, definitive here... — Mari153, Jan 28 '22 at 11:35
That's fair enough, however one can also see it like this: Generally, mathematical notation should not be taken as universally defined, but rather be explicitly defined when used. Mathematics is not about fixed meanings of descriptors, but about what is implied by the definitions, whatever they are. I'm not sure what the story is in case of Hellinger, but it may for example be that one person came up with one definition, and someone else discovered that there is a nice theoretical motivation that adds in a certain scale factor and is therefore in some sense "better". (To be continued) — Christian Hennig, Jan 28 '22 at 11:59
Also, this other person may have realised that for all conceivable applications it may not matter whether the scale factor is there or not. So now, which definition is "correct"? I'd say there is no answer (and whose job would it be to decide?). People who use it better write down explicitly which one they mean, which can be one or the other, then the reader knows and all is fine. (Of course one can argue that *some* things should be defined everywhere in the same way as one can hardly explain everything from scratch, so it's hard to draw the line, but still...) — Christian Hennig, Jan 28 '22 at 12:01
One other example is that of information criteria AIC and BIC there are "larger is better" and "smaller is better"-versions (use of factor -1 or not) around. In some literature they won't even tell you explicitly which one they use, and I do agree that this is annoying, and it'd be nicer to have a unique one used by everyone, but then there's no instance to decide this, so we have to live with what's around (I teach my students to explicitly define all notation whenever possible). — Christian Hennig, Jan 28 '22 at 12:06
Thanks for the replies - very thoughtful. As for which definition is correct? Well, that's the place for accepted convention. For example the standard deviation is preferred to the [mean absolute deviation](https://stats.stackexchange.com/questions/81986/mean-absolute-deviation-vs-standard-deviation). While both are mathematically correct, the former is overwhelming preferred by convention. It also helps they have different names - unlike the different Hellinger Distance expressions! In terms of the Hellinger Distance, the lack of 'convention' may just offer the space I am seeking for a PhD... — Mari153, Jan 29 '22 at 00:57
Standard deviation vs. mean absolute deviation is essentially different, as these are not in any relevant circumstance equivalent. In fact there's more than convention to it, both have advantages and disadvantages. This is a methodological issue, whereas the precise definition of Hellinger distance is just notational. — Christian Hennig, Jan 29 '22 at 10:59
I can measure length in kilometers or meters. Which one is correct? — Sycorax, Jan 30 '22 at 14:23
@ChristianHennig Perhaps you could turn your comments into an answer? — Sycorax, Jan 30 '22 at 14:24

score 1 · Accepted Answer · answered Feb 01 '22 at 11:19

This doesn't really answer the question but maybe helpful anyway.

All applications of the Hellinger distance I can think of are invariant to whether there's a factor 2 in the definition or not, potentially adjusting, e.g., threshold values by the same factor. Obviously whatever version is used needs to be used consistently, so it is advisable that the used formula is always explicitly given when using the Hellinger distance.

For this reason, most mathematicians would consider the two versions equivalent. There is no consensus about which one is right, and there is no authority that would enforce such a consensus. Most mathematicians would think that no such consensus is needed, as the two are "the same" in all relevant aspects anyway.

Historically, one possibility for such a situation to emerge (I don't know about the Hellinger distance in particular) is that somebody defines a concept originally, and somebody else discovers that the same concept (but multiplied with a constant factor not present in the original definition) emerges nicely out of some theoretical considerations that help a lot motivating the concept; after which for both versions there is a reason to be seen as legitimate.

Generally, a mathematical way of looking at such things is that names and notation should not be taken as having a generally agreed meaning but rather they should be explicitly defined when used and then they are what they are defined to be, in the specific place.

It has to be admitted though that there are limitations to this attitude. Work of a certain complexity cannot define everything from scratch for pragmatic reasons, and non-mathematicians are understandably often baffled by the same name apparently not referring to the (exactly) same thing. So certain conventions are required and some exist (too many from the point of view of some pure mathematicians; not enough from the point of view of many other people).

As another example, personally I am annoyed to see that the BIC and AIC as used for model selection are used in some literature in the positive and in other literature in the negative form, so in one case "larger is better", in another one "smaller is better" - for sure the authors need to tell the readers explicitly which version is used, but in many places this is not done, and the reader has to guess from looking at reported results which one it is.

Thanks Christian. I'll give you the bounty. Your answer is close enough for me and has pointed me in a good direction. I have uncovered some research that makes it clear that for the Hellinger Distance "diﬀerent deﬁnitions are used frequently in the literature". For my two cents worth, based on what I have read and the associated proofs, I lean towards the equation without the 2-factor. — Mari153, Feb 02 '22 at 11:35

What is the correct expression of the Hellinger Distance equation?

1 Answers1