21

In a setting where one observes $X_1,\ldots,X_n$ distributed from a distribution with density $f$, I wonder if there is an unbiased estimator (based on the $X_i$'s) of the Hellinger distance to another distribution with density $f_0$, namely $$ \mathfrak{H}(f,f_0) = \left\{ 1 - \int_\mathcal{X} \sqrt{f(x)f_0(x)} \text{d}x \right\}^{1/2}\,. $$

Xi'an
  • 90,397
  • 9
  • 157
  • 575
  • 4
    So f0 is known and fixed. But is f known or from a parametric family or are doing this in a nonparametric framework with all you know about f coming from your sample? I think it makes a difference when attempting an answer. – Michael R. Chernick Jun 01 '12 at 10:16
  • 3
    @MichaelChernick: assume all you know about $f$ is the sample $X_1,\ldots,X_n$. – Xi'an Jun 01 '12 at 10:57
  • 2
    I do not think it has been calculated (if there exists). If there exists, then AIC has a lost brother. –  Jun 01 '12 at 12:49
  • 1
    I think I am in agreement with Procrastinator. I haven't got any idea about how this could be done nonparametrically. You could use a kernel density estimate of f. But the density surely could not be unbiased for all x. So how could you possibly choose a kernel that would make that function an unbiased estimator of the distance. This is even worse of a problem if f and f0 have unbounded range because your data gives you no information on the very extreme tails which could still play a role in the calculation of the integral. I can't prove that it is impossible but I think it is! – Michael R. Chernick Jun 01 '12 at 14:48
  • 4
    An attack on this problem looks feasible if you assume $f$ and $f_0$ are discrete. This leads to an obvious estimator (compute the Hellinger distance between the EDF and $f_0$). Bootstrapping (theoretically, not via simulation!) will give us a handle on the possible bias as well as a way to reduce (or even eliminate) the bias. I hold out some hope to succeed with the *squared* distance rather than the distance itself, because it is mathematically more tractable. The assumption of a discrete $f$ is no problem in applications; the space of discrete $f$ is a dense subset anyway. – whuber Jun 01 '12 at 16:00
  • Thanks for the suggestions. I was rather thinking that, since the integral is an expectation under $f$, the sample could be used as such... – Xi'an Jun 01 '12 at 16:34
  • 2
    It comes to mind Rosenblatt's proof that there is no "bona fide" unbiased estimator of $f$. May we overcome that and get an unbiadsed estimator of $H(f,f_0)$? I don't know. – Zen Sep 02 '12 at 19:58
  • 1
    In line with the first comment made by Michael, if $\varphi$ is the characteristic function of $f$, we may introduce the usual estimate for $f$ using Fourier's inversion formula: $\hat{f}(x)=\int_{-\infty}^\infty e^{-itx}\,r_w(t)\, \hat{\varphi}_n(t)\,dt$, where $r_w$ is a regularizer (necessary to make the integral finite) with "window" size $w$, and $\hat{\varphi}_n$ is the empirical characteristic function (of course, after integration we will arrive at a traditional kernel estimate). – Zen Sep 02 '12 at 20:12
  • 1
    Now, if we use this $\hat{f}$ to evaluate the Hellinger distance, we will have a result that does depend on $w$, say $H_w(\hat{f},f_0)$. So my question (sorry) is if there are cases where we have we $H_w(\hat{f},f_0) < H_w(\hat{f},f_1)$, *uniformly* on $w$ (that is, for every $w>0$), where $f_1$ is another known candidate density. – Zen Sep 02 '12 at 20:13
  • 1
    Another possibility is trying to prove that there is no unbiased estimator of $H(f,f_0)$ with a Rosenblatt style argument. – Zen Sep 02 '12 at 20:15
  • 1
    @Zen: Interesting link, however $H(f,f_0)$ is a number while $f$ is a function. I am therefore unconvinced the connection is strong enough.... – Xi'an Sep 04 '12 at 19:22
  • 1
    Yeah, I was just thinking about that... P.S. I've seen your ABC talk on the ISBA site. Very good. – Zen Sep 04 '12 at 23:47

2 Answers2

13

I don't know how to construct (if it exists) an unbiased estimator of the Hellinger distance. It seems possible to construct a consistent estimator. We have some fixed known density $f_0$, and a random sample $X_1,\dots,X_n$ from a density $f>0$. We want to estimate $$ H(f,f_0) = \sqrt{1 - \int_\mathscr{X} \sqrt{f(x)f_0(x)}\,dx} = \sqrt{1 - \int_\mathscr{X} \sqrt{\frac{f_0(x)}{f(x)}}\;\;f(x)\,dx} $$ $$ = \sqrt{1 - \mathbb{E}\left[\sqrt{\frac{f_0(X)}{f(X)}}\;\;\right] }\, , $$ where $X\sim f$. By the SLLN, we know that $$ \sqrt{1 - \frac{1}{n} \sum_{i=1}^n \sqrt{\frac{f_0(X_i)}{f(X_i)}}} \quad \rightarrow H(f,f_0) \, , $$ almost surely, as $n\to\infty$. Hence, a resonable way to estimate $H(f,f_0)$ would be to take some density estimator $\hat{f_n}$ (such as a traditional kernel density estimator) of $f$, and compute $$ \hat{H}=\sqrt{1 - \frac{1}{n} \sum_{i=1}^n \sqrt{\frac{f_0(X_i)}{\hat{f_n}(X_i)}}} \, . $$

Zen
  • 21,786
  • 3
  • 72
  • 114
  • 3
    @Zen: Good point! I consider this answer as **the** answer because it made me realise $H$ sounds very much like a standard deviation, for which there exists no unbiased estimator. As for the variance of $\hat H^2_n$, no worries: $\mathbb{E}[(\sqrt{f_0(X)/f(X)})^2]=1$ implies that this estimator has a finite variance. – Xi'an Oct 19 '12 at 05:42
  • 1
    Thanks for the clarification about the variance of the estimator, Xi'an! – Zen Oct 19 '12 at 13:22
  • 2
    Some work on other consistent estimators: (a) https://arxiv.org/abs/1707.03083 and related work based on $k$-NN density estimators; (b) https://arxiv.org/abs/1402.2966 based on correcting kernel density estimates; (c) http://ieeexplore.ieee.org/document/5605355/ based on a connection to classification. (Many of these are based on samples from both $f$ and $f_0$, because that's the work I knew about offhand, but I think there are variants for known $f_0$.) – Danica Mar 06 '18 at 17:09
7

No unbiased estimator either of $\mathfrak{H}$ or of $\mathfrak{H}^2$ exists for $f$ from any reasonably broad nonparametric class of distributions.

We can show this with the beautifully simple argument of

Bickel and Lehmann (1969). Unbiased estimation in convex families. The Annals of Mathematical Statistics, 40 (5) 1523–1535. (project euclid)

Fix some distributions $F_0$, $F$, and $G$, with corresponding densities $f_0$, $f$, and $g$. Let $H(F)$ denote $\mathfrak{H}(f, f_0)$, and let $\hat H(\mathbf X)$ be some estimator of $H(F)$ based on $n$ iid samples $X_i \sim F$.

Suppose that $\hat H$ is unbiased for samples from any distribution of the form $$M_\alpha := \alpha F + (1 - \alpha) G .$$ But then \begin{align} Q(\alpha) &= H(M_\alpha) \\&= \int_{x_1} \cdots \int_{x_n} \hat H(\mathbf X) \,\mathrm{d}M_\alpha(x_1) \cdots\mathrm{d}M_\alpha(x_n) \\&= \int_{x_1} \cdots \int_{x_n} \hat H(\mathbf X) \left[ \alpha \mathrm{d}F(x_1) + (1-\alpha) \mathrm{d}G(x_1) \right] \cdots \left[ \alpha \mathrm{d}F(x_n) + (1-\alpha) \mathrm{d}G(x_n) \right] \\&= \alpha^n \operatorname{\mathbb{E}}_{\mathbf X \sim F^n}[ \hat H(\mathbf X)] + \dots + (1 - \alpha)^n \operatorname{\mathbb{E}}_{\mathbf X \sim G^n}[ \hat H(\mathbf X)] ,\end{align} so that $Q(\alpha)$ must be a polynomial in $\alpha$ of degree at most $n$.

Now, let's specialize to a reasonable case and show that the corresponding $Q$ is not polynomial.

Let $F_0$ be some distribution which has constant density on $[-1, 1]$: $f_0(x) = c$ for all $\lvert x \rvert \le 1$. (Its behavior outside that range doesn't matter.) Let $F$ be some distribution supported only on $[-1, 0]$, and $G$ some distribution supported only on $[0, 1]$.

Now \begin{align} Q(\alpha) &= \mathfrak{H}(m_\alpha, f_0) \\&= \sqrt{1 - \int_{\mathbb R} \sqrt{m_\alpha(x) f_0(x)} \mathrm{d}x} \\&= \sqrt{1 - \int_{-1}^0 \sqrt{c \, \alpha f(x)} \mathrm{d}x - \int_{0}^1 \sqrt{c \, (1 - \alpha) g(x)} \mathrm{d}x} \\&= \sqrt{1 - \sqrt{\alpha} B_F - \sqrt{1 - \alpha} B_G} ,\end{align} where $B_F := \int_{\mathbb R} \sqrt{f(x) f_0(x)} \mathrm{d}x$ and likewise for $B_G$. Note that $B_F > 0$, $B_G > 0$ for any distributions $F$, $G$ which have a density.

$\sqrt{1 - \sqrt{\alpha} B_F - \sqrt{1 - \alpha} B_G}$ is not a polynomial of any finite degree. Thus, no estimator $\hat H$ can be unbiased for $\mathfrak{H}$ on all of the distributions $M_\alpha$ with finitely many samples.

Likewise, because $1 - \sqrt{\alpha} B_F - \sqrt{1 - \alpha} B_G$ is also not a polynomial, there is no estimator for $\mathfrak{H}^2$ which is unbiased on all of the distributions $M_\alpha$ with finitely many samples.

This excludes pretty much all reasonable nonparametric classes of distributions, except for those with densities bounded below (an assumption nonparametric analyses sometimes make). You could probably kill those classes too with a similar argument by just making the densities constant or something.

Danica
  • 21,852
  • 1
  • 59
  • 115