12

Wikipedia - KL properties says that KL can never be negative. But e.g. for texts where the probabilities are very small I somehow get negative values? E.g.

Collection A: - word count: 321 doc count: 65888 probA: 0,004871904

Collection B: - word count: 1244 doc count: 120344 probB: =0,010337034

KL = $0.004871904 \cdot \ln\frac{0.004871904}{0.010337034} = -0.003664881$

Martin Thoma
  • 1,449
  • 1
  • 17
  • 30
Andreas
  • 579
  • 1
  • 5
  • 19

1 Answers1

27

KL-divergence is the sum of $q(i)\log\frac{q(i)}{p(i)}$ across all values of $i$. You've only got one instance ($i$) in your equation. For example, if your model was binomial (only two possible words occurred in your document) and $Pr(word1)$ was 0.005 in document 1 and 0.01 in document 2 then you would have:

\begin{equation} KL = 0.005*\log\frac{0.005}{0.01} + 0.995*\log\frac{0.995}{0.99} = 0.001547 \geq 0. \end{equation}

This sum (or integral in the case of continuous random variables) will always be positive, by the Gibbs inequality (see http://en.wikipedia.org/wiki/Gibbs%27_inequality).

Sam Livingstone
  • 1,385
  • 11
  • 18