Why KL divergence close to zero when Q close to P?

Question

I was understanding cross-entropy and ended up understanding KL divergence. I learnt Cross entropy is Entropy + KL Divergence:

H(P, Q) = H(P) + D_KL(P||Q)

Minimizing Cross-entropy means minimizing KL Divergence. I further read that minimizing KL divergence means we are trying to make Q close to P. But, I really wanted to know why this happens? I read from many sources that when Q close to P, DKL close to zero but I didn't find any proper justification for this. I wonder if somebody has better insights on this.

KL divergence has a relationship to a distance distance, if P and Q are close means distance between them is getting closer to zero. Some useful answers here, relating KL to a metric: https://stats.stackexchange.com/q/1031 — msuzen, Nov 08 '21 at 17:40
What is it that you don't understand? If P and Q are identical, Q doesn't diverge at all from P and the KL-divergence is accordingly zero. That's how it's designed. Or is it the math that you don't understand? Have you looked up the definition of KL-divergence? — Igor F., Nov 08 '21 at 17:43
@IgorF, yeah I understand KL-divergence will be zero when Q ~ P but I wanted to know what exactly happens when Q approaches P as I have a feeling that KL divergence will also getting smaller and finally becomes zero when Q = P. — Nisan Chhetri, Nov 08 '21 at 17:48

score 2 · Answer 1 · answered Nov 09 '21 at 07:35

2

For discrete random variables $P$ and $Q$, the KL-divergence is defined as

$$ D_{KL}(P || Q) = \sum_x P(x) \ln\frac{P(x)}{Q(x)} $$

So, as $Q \rightarrow P$, the ratio $P(x)/Q(x)$ approaches $1$ for all $x$ and the logarithm $\ln P(x)/Q(x)$ approaches zero. As probabilities are bounded to the range $[0, 1]$, each term in the sum, $P(x) \ln\frac{P(x)}{Q(x)}$ also approaches zero and, consequently, the whole sum also approaches zero.

So far the mathematical formalism. For some intuition behind it, you may consult my answer here.

answered Nov 09 '21 at 07:35

Igor F.

6,004
1
16
41

I completely agree with you and I am also aware of ratio of P & Q close to one once these distributions close to each other (intuitively as well). However, I am looking for more mathematical proof what exactly happens or how value of D_KL changes when Q approaches to P. I also wanted to understand this scenario for all possible distributions of P and Q. Is there any such analysis or evaluation? – Nisan Chhetri Nov 10 '21 at 22:26

Why KL divergence close to zero when Q close to P?

1 Answers1