Is it possible to apply KL divergence between discrete and continuous distribution?

Question

I am not a mathematician. I have searched the internet about KL Divergence. What I learned is the the KL divergence measures the information lost when we approximate distribution of a model with respect to the input distribution. I have seen these between any two continuous or discrete distributions. Can we do it between continuous and discrete or vice versa?

Related: http://stats.stackexchange.com/q/6907/2970 – cardinal Sep 04 '13 at 01:28 — cardinal, Sep 04 '13 at 01:28

score 11 · Answer 1 · answered Jun 05 '17 at 23:59

11

Yes, the KL divergence between continuous and discrete random variables is well defined. If $P$ and $Q$ are distributions on some space $\mathbb{X}$, then both $P$ and $Q$ have densities $f$, $g$ with respect to $\mu = P+Q$ and $$ D_{KL}(P,Q) = \int_{\mathbb{X}} f \log\frac{f}{g}d\mu. $$

For example, if $\mathbb{X} = [0,1]$, $P$ is Lebesgue's measure and $Q = \delta_0$ is a point mass at $0$, then $f(x) = 1-\mathbb{1}_{x=0}$, $g(x) = \mathbb{1}_{x=0}$ and $$D_{KL}(P, Q) = \infty.$$

answered Jun 05 '17 at 23:59

Olivier

738
5
22

How do you prove that $\int_{\mathbb{X}} f \log\frac{f}{g}d\mu$ is independent of the dominating measure ? – Gabriel Romon Feb 01 '19 at 11:36
Change of measure theorem. – Olivier Feb 02 '19 at 22:26
@Olivier with that $f$ you get $P([0,0.5]) = -0.5$ but probability measures must be non-negative. Try maybe a convex sum between the two. – Jorge E. Cardona Jun 30 '20 at 20:12
You're forgetting the point mass at 0 in the base measure. – Olivier Jun 30 '20 at 21:36
Ok, I get it now, is there a case when $D_{KL}(P,Q)$ is finite? @Olivier – Jorge E. Cardona Jul 01 '20 at 16:09

score 5 · Accepted Answer · answered Sep 04 '13 at 00:10

No: KL divergence is only defined on distributions over a common space. It asks about the probability density of a point $x$ under two different distributions, $p(x)$ and $q(x)$. If $p$ is a distribution on $\mathbb{R}^3$ and $q$ a distribution on $\mathbb{Z}$, then $q(x)$ doesn't make sense for points $p \in \mathbb{R}^3$ and $p(z)$ doesn't make sense for points $z \in \mathbb{Z}$. In fact, we can't even do it for two continuous distributions over different-dimensional spaces (or discrete, or any case where the underlying probability spaces don't match).

If you have a particular case in mind, it may be possible to come up with some similar-spirited measure of dissimilarity between distributions. For example, it might make sense to encode a continuous distribution under a code for a discrete one (obviously with lost information), e.g. by rounding to the nearest point in the discrete case.

Note that the KL divergence between discrete and absolutely continuous distributions is well defined. — Olivier, Jun 06 '17 at 11:37
@Olivier The usual definition requires a common dominating measure, no? — Danica, Jun 06 '17 at 12:01
You are right when P and Q are defined on different spaces. But on a common measurable space, such a measure always exist (take P+Q for instance), and the KL divergence does not depend on the particular choice of dominating measure. — Olivier, Jun 06 '17 at 12:39

jtobin · Answer 3 · 2013-09-04T00:18:04.263

Not in general. The KL divergence is

$$ D_{KL}(P \ || \ Q) = \int_{\mathcal{X}} \log \left(\frac{dP}{dQ}\right)dP $$

provided that $P$ is absolutely continuous with respect to $Q$ and both $P$ and $Q$ are $\sigma$-finite (i.e. under conditions where $\frac{dP}{dQ}$ is well-defined).

For a 'continuous-to-discrete' KL divergence between measures on some usual space, you have the case where Lebesgue measure is absolutely continuous with respect to counting measure, but counting measure is not $\sigma$-finite.

Is it possible to apply KL divergence between discrete and continuous distribution?

3 Answers3

Linked