KL Divergence of two standard normal arrays

Question

I generated two 9000,1 np arrays with

a = np.random.standard_normal(9000)

b = np.random.standard_normal(9000)

Then I check the KL Divergence with

kld = tf.keras.losses.KLD(a,b)
print(kld)

This gives me a value of ~20,000+. I am expecting a value close to 0. Is this because my arrays do not represent probability values? If so, what must I do to get a value closer to 0?

Further more, when I modify b by doing b*3 + 5 (which now is a N(5,3)?) I get a negative KL divergence. What is going wrong?

What I really want is a KL Divergence between two N(0,1) generated arrays. How should I go about it?

What do you think KL divergence means? How do you compute it by hand? — EngrStudent, Sep 10 '19 at 19:42
I don't know why this is accumulating closure votes. The core of the question seems to hinge on a misunderstanding of what KL divergence is. — Sycorax, Sep 11 '19 at 14:32

Sycorax · Answer 1 · 2019-09-10T21:44:25.550

If we look at the source, we see that the function is computing

math_ops.reduce_sum(y_true * math_ops.log(y_true / y_pred), axis=-1)

(elements of y_true and y_pred less than epsilon are pinned to epsilon so as to avoid divide-by-zero or logarithms of negative numbers). This is the definition of KLD for two discrete distributions. If this isn't what you want to compute, you'll have to use a different function. In particular, normal deviates are not discrete, nor are they themselves probabilities (because normal deviates can be negative or exceed 1, which probabilities cannot be). These observations strongly suggest that you're using the function incorrectly.

If we read the documentation, we find that the example usage returns a negative value, so apparently the Keras authors are not concerned by negative outputs (even though KL Divergence is positive).

On the one hand, the documentation is perplexing. The example input has a sum greater than 1, suggesting that it is not a discrete probability distribution (because probability distributions must sum to 1). It's not obvious if this is merely an oversight or a deliberate choice to illustrate some obscure usage.
On the other hand, this seems to suggest you have different expectations than the software authors.

If you're sure that you want to estimate the KLD between the two collections of normal deviates, then you'll want to try something like the method outlined here: Kullback-Leibler Divergence for two samples. Critically, the procedures outlined in these answers are completely different from what is implemented in this Keras function.

For another example of what you might be interested in, the KL divergence of two normal distributions with known means and variances is given here.

The keras example gives a negative number precisely because the input is not a proper distribution; since it returns $\operatorname{KL}\left( \mathbb P_\mathrm{true} \| \mathbb P_\mathrm{pred} \right)$, it must be nonnegative for valid inputs. I don't know why they used this nonsensical example. — Danica, Sep 10 '19 at 21:35
@Dougal By my count, that makes two of us who are perplexed by the Keras example. — Sycorax, Sep 10 '19 at 21:36
Also, KL isn't an IPM, and the paper you linked by Sriperumbudur et al. isn't really relevant to estimating KL as far as I know. More relevant papers are Wang et al. "[Divergence Estimation for Multidimensional Densities via $k$-Nearest-Neighbor Distance](https://www.princeton.edu/~kulkarni/Papers/Journals/j068_2009_WangKulVer_TransIT.pdf)" (IEEE Trans. IT, 2009) or Moon and Hero, "[Multivariate $f$-Divergence Estimation With Confidence](https://arxiv.org/abs/1411.2045)" (NeurIPS, 2014). — Danica, Sep 10 '19 at 21:43

KL Divergence of two standard normal arrays

1 Answers1