What is the KL divergence of distribution from Dirac delta?

Question

The Kullback–Leibler (KL) divergence of two continuous distributions $P(x)$ and $Q(x)$ is defined as

$$D_{KL}(P \mid\mid Q) = \int_{X} P(x) \log{\left[\frac{P(x)}{Q(x)}\right]} dx$$

How can one compute the KL divergence when $Q$ is the Dirac delta function $Q(x)=\delta(x-x_0)$?

Of course, we can expand the logarithm above to

$$D_{KL}(P \mid\mid Q) = \int_{X} P(x) \log{\left[P(x)\right]} dx - \int_{X} P(x) \log{\left[\delta(x-x_0)\right]} dx$$

How is the integral on the right evaluated, given that the logarithm approaches $-\infty$ for all $x \ne x_0$ and $\infty$ for $x=x_0$?

KL divergence requires that $P\ll Q$, which translates to $P$ assigning 0 measure to sets $A$ whenever $Q(A)=0$. In this case $Q(A)=0$ whenever $x_0\notin A$. So the only sensible $P$ here would be $P=Q$, which is silly. — Alex R., Jul 17 '17 at 23:00
@AlexR., Thanks for your reply. Could you explain what is meant by $P \ll Q$? Does this mean that $P(x) \ll Q(x)$ for all $x$? That doesn't seem possible for proper distributions. Or does it mean that the maximum value of $P(x)$ is much smaller than the maximum value of $Q(x)$, i.e., that $Q(x)$ is more peaked? Because that would seem to follow in this case. — saxen, Jul 17 '17 at 23:27
@AlexR., additionally, my understanding is that $D_{KL}(P \mid\mid Q)$ and $D_{KL}(Q \mid\mid P)$ can both exist for the same $P$ and $Q$. How can it be that $P \ll Q$ and $Q \ll P$ can both be true? — saxen, Jul 17 '17 at 23:30
$P\ll Q$ means $P$ is absolutely continuous with respect to $Q$: http://mathworld.wolfram.com/AbsolutelyContinuous.html As an example, if you have two distributions with non-zero density functions, then they are absolutely continuous with respect to each other (because the only sets which they assign 0 mass to are singletons). — Alex R., Jul 17 '17 at 23:49
@AlexR., Ah, that makes more sense. So that's why, for the example above, if $Q$ is the Dirac delta, we could compute $D_{KL}(Q \mid\mid P)$, but not $D_{KL}(P \mid\mid Q)$. — saxen, Jul 18 '17 at 01:02
The *definition* of Delta is as a limit (in a suitable topology) of distributions that are shrunk to zero. When $P(x_0)\gt 0$, what happens to the divergence? There's your answer. — whuber, Jul 18 '17 at 13:09
So if $P = \delta(x_0)$ and $Q = N(0, 1)$, then $Q({x_0}) = 0$ but $P({x_0}) = 1$. Hence, $P$ is not absolutely continuous w/r to $Q$ and the K-L divergence is infinite. Am I correct? — quant_dev, Jul 31 '19 at 09:32

What is the KL divergence of distribution from Dirac delta?

0 Answers0