I am not a mathematician. I have searched the internet about KL Divergence. What I learned is the the KL divergence measures the information lost when we approximate distribution of a model with respect to the input distribution. I have seen these between any two continuous or discrete distributions. Can we do it between continuous and discrete or vice versa?

- 132,789
- 81
- 357
- 650

- 337
- 2
- 9
-
Related: http://stats.stackexchange.com/q/6907/2970 – cardinal Sep 04 '13 at 01:28
3 Answers
Yes, the KL divergence between continuous and discrete random variables is well defined. If $P$ and $Q$ are distributions on some space $\mathbb{X}$, then both $P$ and $Q$ have densities $f$, $g$ with respect to $\mu = P+Q$ and $$ D_{KL}(P,Q) = \int_{\mathbb{X}} f \log\frac{f}{g}d\mu. $$
For example, if $\mathbb{X} = [0,1]$, $P$ is Lebesgue's measure and $Q = \delta_0$ is a point mass at $0$, then $f(x) = 1-\mathbb{1}_{x=0}$, $g(x) = \mathbb{1}_{x=0}$ and $$D_{KL}(P, Q) = \infty.$$

- 738
- 5
- 22
-
How do you prove that $\int_{\mathbb{X}} f \log\frac{f}{g}d\mu$ is independent of the dominating measure ? – Gabriel Romon Feb 01 '19 at 11:36
-
-
@Olivier with that $f$ you get $P([0,0.5]) = -0.5$ but probability measures must be non-negative. Try maybe a convex sum between the two. – Jorge E. Cardona Jun 30 '20 at 20:12
-
-
Ok, I get it now, is there a case when $D_{KL}(P,Q)$ is finite? @Olivier – Jorge E. Cardona Jul 01 '20 at 16:09
No: KL divergence is only defined on distributions over a common space. It asks about the probability density of a point $x$ under two different distributions, $p(x)$ and $q(x)$. If $p$ is a distribution on $\mathbb{R}^3$ and $q$ a distribution on $\mathbb{Z}$, then $q(x)$ doesn't make sense for points $p \in \mathbb{R}^3$ and $p(z)$ doesn't make sense for points $z \in \mathbb{Z}$. In fact, we can't even do it for two continuous distributions over different-dimensional spaces (or discrete, or any case where the underlying probability spaces don't match).
If you have a particular case in mind, it may be possible to come up with some similar-spirited measure of dissimilarity between distributions. For example, it might make sense to encode a continuous distribution under a code for a discrete one (obviously with lost information), e.g. by rounding to the nearest point in the discrete case.

- 21,852
- 1
- 59
- 115
-
2Note that the KL divergence between discrete and absolutely continuous distributions is well defined. – Olivier Jun 06 '17 at 11:37
-
@Olivier The usual definition requires a common dominating measure, no? – Danica Jun 06 '17 at 12:01
-
1You are right when P and Q are defined on different spaces. But on a common measurable space, such a measure always exist (take P+Q for instance), and the KL divergence does not depend on the particular choice of dominating measure. – Olivier Jun 06 '17 at 12:39
Not in general. The KL divergence is
$$ D_{KL}(P \ || \ Q) = \int_{\mathcal{X}} \log \left(\frac{dP}{dQ}\right)dP $$
provided that $P$ is absolutely continuous with respect to $Q$ and both $P$ and $Q$ are $\sigma$-finite (i.e. under conditions where $\frac{dP}{dQ}$ is well-defined).
For a 'continuous-to-discrete' KL divergence between measures on some usual space, you have the case where Lebesgue measure is absolutely continuous with respect to counting measure, but counting measure is not $\sigma$-finite.

- 1,446
- 8
- 9