If we look at the source, we see that the function is computing
math_ops.reduce_sum(y_true * math_ops.log(y_true / y_pred), axis=-1)
(elements of y_true
and y_pred
less than epsilon
are pinned to epsilon
so as to avoid divide-by-zero or logarithms of negative numbers). This is the definition of KLD for two discrete distributions. If this isn't what you want to compute, you'll have to use a different function. In particular, normal deviates are not discrete, nor are they themselves probabilities (because normal deviates can be negative or exceed 1, which probabilities cannot be). These observations strongly suggest that you're using the function incorrectly.
If we read the documentation, we find that the example usage returns a negative value, so apparently the Keras authors are not concerned by negative outputs (even though KL Divergence is positive).
- On the one hand, the documentation is perplexing. The example input has a sum greater than 1, suggesting that it is not a discrete probability distribution (because probability distributions must sum to 1). It's not obvious if this is merely an oversight or a deliberate choice to illustrate some obscure usage.
- On the other hand, this seems to suggest you have different expectations than the software authors.
If you're sure that you want to estimate the KLD between the two collections of normal deviates, then you'll want to try something like the method outlined here: Kullback-Leibler Divergence for two samples. Critically, the procedures outlined in these answers are completely different from what is implemented in this Keras function.
For another example of what you might be interested in, the KL divergence of two normal distributions with known means and variances is given here.