18

I am not a mathematician. I have searched the internet about KL Divergence. What I learned is the the KL divergence measures the information lost when we approximate distribution of a model with respect to the input distribution. I have seen these between any two continuous or discrete distributions. Can we do it between continuous and discrete or vice versa?

gung - Reinstate Monica
  • 132,789
  • 81
  • 357
  • 650
prakash
  • 337
  • 2
  • 9

3 Answers3

11

Yes, the KL divergence between continuous and discrete random variables is well defined. If $P$ and $Q$ are distributions on some space $\mathbb{X}$, then both $P$ and $Q$ have densities $f$, $g$ with respect to $\mu = P+Q$ and $$ D_{KL}(P,Q) = \int_{\mathbb{X}} f \log\frac{f}{g}d\mu. $$

For example, if $\mathbb{X} = [0,1]$, $P$ is Lebesgue's measure and $Q = \delta_0$ is a point mass at $0$, then $f(x) = 1-\mathbb{1}_{x=0}$, $g(x) = \mathbb{1}_{x=0}$ and $$D_{KL}(P, Q) = \infty.$$

Olivier
  • 738
  • 5
  • 22
5

No: KL divergence is only defined on distributions over a common space. It asks about the probability density of a point $x$ under two different distributions, $p(x)$ and $q(x)$. If $p$ is a distribution on $\mathbb{R}^3$ and $q$ a distribution on $\mathbb{Z}$, then $q(x)$ doesn't make sense for points $p \in \mathbb{R}^3$ and $p(z)$ doesn't make sense for points $z \in \mathbb{Z}$. In fact, we can't even do it for two continuous distributions over different-dimensional spaces (or discrete, or any case where the underlying probability spaces don't match).

If you have a particular case in mind, it may be possible to come up with some similar-spirited measure of dissimilarity between distributions. For example, it might make sense to encode a continuous distribution under a code for a discrete one (obviously with lost information), e.g. by rounding to the nearest point in the discrete case.

Danica
  • 21,852
  • 1
  • 59
  • 115
  • 2
    Note that the KL divergence between discrete and absolutely continuous distributions is well defined. – Olivier Jun 06 '17 at 11:37
  • @Olivier The usual definition requires a common dominating measure, no? – Danica Jun 06 '17 at 12:01
  • 1
    You are right when P and Q are defined on different spaces. But on a common measurable space, such a measure always exist (take P+Q for instance), and the KL divergence does not depend on the particular choice of dominating measure. – Olivier Jun 06 '17 at 12:39
2

Not in general. The KL divergence is

$$ D_{KL}(P \ || \ Q) = \int_{\mathcal{X}} \log \left(\frac{dP}{dQ}\right)dP $$

provided that $P$ is absolutely continuous with respect to $Q$ and both $P$ and $Q$ are $\sigma$-finite (i.e. under conditions where $\frac{dP}{dQ}$ is well-defined).

For a 'continuous-to-discrete' KL divergence between measures on some usual space, you have the case where Lebesgue measure is absolutely continuous with respect to counting measure, but counting measure is not $\sigma$-finite.

jtobin
  • 1,446
  • 8
  • 9