0

I Use Python and the following definition of KL-Divergence

def kl_divergence(p, q):
      return np.sum(np.where(p != 0, p * np.log(p / q), 0))

to calculate the divergence betweet two normal distributions:


x = np.linspace(norm.ppf(0.01, loc=0, scale=1), norm.ppf(0.99, loc=0, scale=1), 100)
a=norm.pdf(x, 0, 2)
b=norm.pdf(x, 2, 2)
kl_divergence(a, b)

The results depend on x and analytically, the result are wrong, because I used KL-Divergemnce for discrete distributions. I belief I could use these results for some practical purposes, but I need the real divergences. My question is, how can I implement KL-Divergence in python, such that it yields the analytically correct divergences? Does it work without integration to somehow transform the discrete results? If no, how can I integrate with numpy and scipy? I want to use it for the distribution that scipy has (normal, laplace,...) included.

Joe_base
  • 105
  • 8
  • Is it for specific distributions? If yes you could use closed form formulas of the KL that exists for the most common distribution including the one that scipy use. – Apprentice Jun 27 '20 at 13:41
  • No, I mean general.I did it with cloes form formulas for two normals. But already for Laplacian and Normal i am stuck with my math skills. – Joe_base Jun 27 '20 at 13:59
  • You must clarify your question! You use a formula for discrete distributions, on the empirical observations. So you calculate the KL divergence between two empirical distributions. It seems what you want is to **estimate**, not calculate, KL divergence. Then we need to know your model assumptions. See https://stats.stackexchange.com/questions/211175/kullback-leibler-divergence-for-two-samples/248657#248657 (maybe a dup) – kjetil b halvorsen Jun 27 '20 at 14:32
  • How about estimating the KL by discretizing the pdfs by uniform sampling and then use your discrete KL formula? – Apprentice Jun 27 '20 at 14:33
  • Apprentice's comment sound like what i want or need to do. Can I do this somehow with numpy/scipy? – Joe_base Jun 27 '20 at 15:03
  • @kjetil: I want to calculate it, however I don't know how I can calculate Integrals with python and thought there might be some "trick" to approximate it with the discrete one. I try to improve my question. – Joe_base Jun 27 '20 at 15:08
  • 1
    As you mentioned, this is the wrong approach because KL divergence for continuous distributions requires integration. Solving the integral in closed form is the best approach if possible. Otherwise, you can compute integrals numerically. For example, see `scipy.integrate.quad()`. In this case, keep in mind that $D_{KL}(p \parallel q) = H(p,q) - H(p)$ (the difference of the cross entropy and the entropy). In some cases, an analytical expression may be available for $H(p)$, so only $H(p,q)$ need be computed numerically. – user20160 Jun 27 '20 at 19:41

0 Answers0