117

I need to determine the KL-divergence between two Gaussians. I am comparing my results to these, but I can't reproduce their result. My result is obviously wrong, because the KL is not 0 for KL(p, p).

I wonder where I am doing a mistake and ask if anyone can spot it.

Let $p(x) = N(\mu_1, \sigma_1)$ and $q(x) = N(\mu_2, \sigma_2)$. From Bishop's PRML I know that

$$KL(p, q) = - \int p(x) \log q(x) dx + \int p(x) \log p(x) dx$$

where integration is done over all real line, and that

$$\int p(x) \log p(x) dx = -\frac{1}{2} (1 + \log 2 \pi \sigma_1^2),$$

so I restrict myself to $\int p(x) \log q(x) dx$, which I can write out as

$$-\int p(x) \log \frac{1}{(2 \pi \sigma_2^2)^{(1/2)}} e^{-\frac{(x-\mu_2)^2}{2 \sigma_2^2}} dx,$$

which can be separated into

$$\frac{1}{2} \log (2 \pi \sigma_2^2) - \int p(x) \log e^{-\frac{(x-\mu_2)^2}{2 \sigma_2^2}} dx.$$

Taking the log I get

$$\frac{1}{2} \log (2 \pi \sigma_2^2) - \int p(x) \bigg(-\frac{(x-\mu_2)^2}{2 \sigma_2^2} \bigg) dx,$$

where I separate the sums and get $\sigma_2^2$ out of the integral.

$$\frac{1}{2} \log (2 \pi \sigma^2_2) + \frac{\int p(x) x^2 dx - \int p(x) 2x\mu_2 dx + \int p(x) \mu_2^2 dx}{2 \sigma_2^2}$$

Letting $\langle \rangle$ denote the expectation operator under $p$, I can rewrite this as

$$\frac{1}{2} \log (2 \pi \sigma_2^2) + \frac{\langle x^2 \rangle - 2 \langle x \rangle \mu_2 + \mu_2^2}{2 \sigma_2^2}.$$

We know that $var(x) = \langle x^2 \rangle - \langle x \rangle ^2$. Thus

$$\langle x^2 \rangle = \sigma_1^2 + \mu_1^2$$

and therefore

$$\frac{1}{2} \log (2 \pi \sigma^2) + \frac{\sigma_1^2 + \mu_1^2 - 2 \mu_1 \mu_2 + \mu_2^2}{2 \sigma_2^2},$$

which I can put as

$$\frac{1}{2} \log (2 \pi \sigma_2^2) + \frac{\sigma_1^2 + (\mu_1 - \mu_2)^2}{2 \sigma_2^2}.$$

Putting everything together, I get to

\begin{align*} KL(p, q) &= - \int p(x) \log q(x) dx + \int p(x) \log p(x) dx\\\\ &= \frac{1}{2} \log (2 \pi \sigma_2^2) + \frac{\sigma_1^2 + (\mu_1 - \mu_2)^2}{2 \sigma_2^2} - \frac{1}{2} (1 + \log 2 \pi \sigma_1^2)\\\\ &= \log \frac{\sigma_2}{\sigma_1} + \frac{\sigma_1^2 + (\mu_1 - \mu_2)^2}{2 \sigma_2^2}. \end{align*} Which is wrong since it equals $1$ for two identical Gaussians.

Can anyone spot my error?

Update

Thanks to mpiktas for clearing things up. The correct answer is:

$KL(p, q) = \log \frac{\sigma_2}{\sigma_1} + \frac{\sigma_1^2 + (\mu_1 - \mu_2)^2}{2 \sigma_2^2} - \frac{1}{2}$

adn
  • 103
  • 1
  • 6
bayerj
  • 12,735
  • 3
  • 35
  • 56
  • 1
    sorry for posting the incorrect answer in the first place. I just looked at $x-\mu_1$ and immediately thought that the integral is zero. The point that it was squared completely missed my mind :) – mpiktas Feb 21 '11 at 12:02
  • what about the multi variate case? –  Oct 23 '11 at 00:49
  • I have just seen in a research paper that kld should be $KL(p, q) = ½ * ((μ₁-μ₂)² + σ₁²+σ₂²) * ( (1/σ₁²) + (1/σ₂²) ) - 2 – skyde Aug 01 '13 at 14:26
  • 1
    I think there is a typo in your question, since I cannot validate it and it also seems that you used the correct version later in your question: $$\int p(x) \log p(x) dx = \frac{1}{2} (1 + \log 2 \pi \sigma_1^2)$$ I think it should be (note the minus): $$\int p(x) \log p(x) dx = -\frac{1}{2} (1 + \log 2 \pi \sigma_1^2)$$ I tried to edit your question and got banned for it, so maybe do it yourself. – yspreen Jan 25 '18 at 13:49
  • 1
    The answer is also in my [1996 paper on Intrinsic losses](https://link.springer.com/article/10.1007/BF00133173). – Xi'an Mar 29 '18 at 20:27
  • Just for reference, this is a special case of https://en.wikipedia.org/wiki/Kullback%E2%80%93Leibler_divergence#Multivariate_normal_distributions – flow2k Aug 09 '19 at 21:47
  • @Xi'an - Thanks for providing a reference. A quick clarification question: Can you explain why your equation provides a value 2 times the answer given by the formula here? What is the difference in interpretation? $$\frac{(\mu_1-\mu_2)^2}{\sigma_2^2}+\frac{\sigma_1^2}{\sigma_2^2}-\log\biggl(\frac{\sigma_1^2}{\sigma_2^2}\biggl)-1$$ – YTD May 04 '20 at 15:13
  • Shouldn't the normal distribution be formulated as (1,1^2) and ()=(2,2^2)? Seems like this will cause some confusion – FFT Jun 23 '21 at 19:47

2 Answers2

84

OK, my bad. The error is in the last equation:

\begin{align} KL(p, q) &= - \int p(x) \log q(x) dx + \int p(x) \log p(x) dx\\\\ &=\frac{1}{2} \log (2 \pi \sigma_2^2) + \frac{\sigma_1^2 + (\mu_1 - \mu_2)^2}{2 \sigma_2^2} - \frac{1}{2} (1 + \log 2 \pi \sigma_1^2)\\\\ &= \log \frac{\sigma_2}{\sigma_1} + \frac{\sigma_1^2 + (\mu_1 - \mu_2)^2}{2 \sigma_2^2} - \frac{1}{2} \end{align}

Note the missing $-\frac{1}{2}$. The last line becomes zero when $\mu_1=\mu_2$ and $\sigma_1=\sigma_2$.

mpiktas
  • 33,140
  • 5
  • 82
  • 138
  • @mpiktas I meant the question really - bayerj Is a well published researcher and I'm an undergrad. Nice to see that even the smart guys fall back to asking on the internet sometimes :) – N. McA. Apr 05 '16 at 10:19
  • 3
    is p $\mu_1 \sigma_1$ or $\mu_2 \sigma_2$ – Kong Jan 20 '18 at 23:41
  • @Kong p is $N(u_1, \sigma_1)$, as noted in the question. – zplizzi Oct 03 '19 at 21:04
41

I did not have a look at your calculation but here is mine with a lot of details. Suppose $p$ is the density of a normal random variable with mean $\mu_1$ and variance $\sigma^2_1$, and that $q$ is the density of a normal random variable with mean $\mu_2$ and variance $\sigma^2_2$. The Kullback-Leibler distance from $q$ to $p$ is:

$\int \left[\log( p(x)) - log( q(x)) \right] p(x) dx$

$=\int \left[ -\frac{1}{2} \log(2\pi) - \log(\sigma_1) - \frac{1}{2} \left(\frac{x-\mu_1}{\sigma_1}\right)^2 + \frac{1}{2}\log(2\pi) + \log(\sigma_2) + \frac{1}{2} \left(\frac{x-\mu_2}{\sigma_2}\right)^2 \right]$ $\times \frac{1}{\sqrt{2\pi}\sigma_1} \exp\left[-\frac{1}{2}\left(\frac{x-\mu_1}{\sigma_1}\right)^2\right] dx$

$=\int \left\{\log\left(\frac{\sigma_2}{\sigma_1}\right) + \frac{1}{2} \left[ \left(\frac{x-\mu_2}{\sigma_2}\right)^2 - \left(\frac{x-\mu_1}{\sigma_1}\right)^2 \right] \right\}$ $\times \frac{1}{\sqrt{2\pi}\sigma_1} \exp\left[-\frac{1}{2}\left(\frac{x-\mu_1}{\sigma_1}\right)^2\right] dx$

$=E_{1} \left\{\log\left(\frac{\sigma_2}{\sigma_1}\right) + \frac{1}{2} \left[ \left(\frac{x-\mu_2}{\sigma_2}\right)^2 - \left(\frac{x-\mu_1}{\sigma_1}\right)^2 \right]\right\}$

$=\log\left(\frac{\sigma_2}{\sigma_1}\right) + \frac{1}{2\sigma_2^2} E_1 \left\{(X-\mu_2)^2\right\} - \frac{1}{2\sigma_1^2} E_1 \left\{(X-\mu_1)^2\right\}$

$=\log\left(\frac{\sigma_2}{\sigma_1}\right) + \frac{1}{2\sigma_2^2} E_1 \left\{(X-\mu_2)^2\right\} - \frac{1}{2}$

(Now note that $(X - \mu_2)^2 = (X-\mu_1+\mu_1-\mu_2)^2 = (X-\mu_1)^2 + 2(X-\mu_1)(\mu_1-\mu_2) + (\mu_1-\mu_2)^2$)

$=\log\left(\frac{\sigma_2}{\sigma_1}\right) + \frac{1}{2\sigma_2^2} \left[E_1\left\{(X-\mu_1)^2\right\} + 2(\mu_1-\mu_2)E_1\left\{X-\mu_1\right\} + (\mu_1-\mu_2)^2\right] - \frac{1}{2}$

$=\log\left(\frac{\sigma_2}{\sigma_1}\right) + \frac{\sigma_1^2 + (\mu_1-\mu_2)^2}{2\sigma_2^2} - \frac{1}{2}$

Taylor
  • 18,278
  • 2
  • 31
  • 66
ocram
  • 19,898
  • 5
  • 76
  • 77