Questions tagged [kullback-leibler]

An asymmetric measure of distance (or dissimilarity) between probability distributions. It might be interpreted as the expected value of the log likelihood ratio under the alternative hypothesis.

Kullback–Leibler divergence is an asymmetric measure of distance (or dissimilarity) between probability distributions. If $F(\cdot)$ and $G(\cdot)$ are the two distribution functions, with $G(\cdot)$ being absolutely continuous with respect to $F(\cdot)$ (i.e., has the support that is a subset of support of $F(\cdot)$), then KL divergence is

$$ D(F,G) = \int \ln\left( \frac{ {\rm d} F}{{\rm d}G}\right) {\rm d} F $$

For continuous distributions, interpret ${\rm d} F$ as the density, and for discrete distributions, as the point mass.

It is not a distance, in that $D(F,G) \neq D(G,F)$, yet it provides an important measure of how similar the two distributions are.

References

https://en.wikipedia.org/wiki/Kullback–Leibler_divergence

465 questions

117

votes

2 answers

KL divergence between two univariate Gaussians

I need to determine the KL-divergence between two Gaussians. I am comparing my results to these, but I can't reproduce their result. My result is obviously wrong, because the KL is not 0 for KL(p, p). I wonder where I am doing a mistake and ask if…

normal-distribution kullback-leibler

asked Feb 21 '11 at 10:30

bayerj

12,735
3
35
56

votes

4 answers

What is the difference Cross-entropy and KL divergence?

Both the cross-entropy and the KL divergence are tools to measure the distance between two probability distributions, but what is the difference between them? $$ H(P,Q) = -\sum_x P(x)\log Q(x) $$ $$ KL(P | Q) = \sum_{x} P(x)\log {\frac{P(x)}{Q(x)}}…

entropy kullback-leibler cross-entropy

asked Jul 19 '18 at 13:02

yoyo

votes

5 answers

Intuition on the Kullback–Leibler (KL) Divergence

I have learned about the intuition behind the KL Divergence as how much a model distribution function differs from the theoretical/true distribution of the data. The source I am reading goes on to say that the intuitive understanding of 'distance'…

distributions distance intuition kullback-leibler

asked Jan 01 '16 at 17:03

cgo

7,445
10
42
61

votes

1 answer

KL divergence between two multivariate Gaussians

I'm having trouble deriving the KL divergence formula assuming two multivariate normal distributions. I've done the univariate case fairly easily. However, it's been quite a while since I took math stats, so I'm having some trouble extending it to…

mathematical-statistics normal-distribution multivariate-normal-distribution kullback-leibler

asked Jun 02 '13 at 20:50

dmartin

3,010
3
22
27

votes

5 answers

What is the advantages of Wasserstein metric compared to Kullback-Leibler divergence?

What is the practical difference between Wasserstein metric and Kullback-Leibler divergence? Wasserstein metric is also referred to as Earth mover's distance. From Wikipedia: Wasserstein (or Vaserstein) metric is a distance function defined between…

distributions kullback-leibler metric wasserstein

asked Aug 01 '17 at 13:54

Thomas Fauskanger

votes

1 answer

Why do we use Kullback-Leibler divergence rather than cross entropy in the t-SNE objective function?

In my mind, KL divergence from sample distribution to true distribution is simply the difference between cross entropy and entropy. Why do we use cross entropy to be the cost function in many machine learning models, but use Kullback-Leibler…

kullback-leibler tsne cross-entropy

asked Mar 07 '17 at 13:26

JimSpark

votes

4 answers

Kullback–Leibler vs Kolmogorov-Smirnov distance

I can see that there are a lot of formal differences between Kullback–Leibler vs Kolmogorov-Smirnov distance measures. However, both are used to measure the distance between distributions. Is there a typical situation where one should be used…

distributions distance-functions kolmogorov-smirnov-test kullback-leibler

asked Apr 07 '11 at 11:39

Greg

votes

2 answers

Differences between Bhattacharyya distance and KL divergence

I'm looking for an intuitive explanation for the following questions: In statistics and information theory, what's the difference between Bhattacharyya distance and KL divergence, as measures of the difference between two discrete probability…

mathematical-statistics information-theory kullback-leibler bhattacharyya

asked Dec 27 '14 at 08:11

JewelSue

votes

4 answers

Measures of similarity or distance between two covariance matrices

Are there any measures of similarity or distance between two symmetric covariance matrices (both having the same dimensions)? I am thinking here of analogues to KL divergence of two probability distributions or the Euclidean distance between vectors…

distributions hypothesis-testing covariance-matrix kullback-leibler information-theory

asked Aug 23 '11 at 02:40

Ram Ahluwalia

3,003
6
27
38

votes

1 answer

Why KL divergence is non-negative?

Why is KL divergence non-negative? From the perspective of information theory, I have such an intuitive understanding: Say there are two ensembles $A$ and $B$ which are composed of the same set of elements labeled by $x$. $p(x)$ and $q(x)$ are…

information-theory kullback-leibler

asked Mar 18 '18 at 10:43

meTchaikovsky

1,414
1
9
23

votes

4 answers

An adaptation of the Kullback-Leibler distance?

Look at this picture: If we draw a sample from the red density then some values are expected to be less than 0.25 whereas it is impossible to generate such a sample from the blue distribution. As a consequence, the Kullback-Leibler distance from…

kullback-leibler

asked Feb 05 '11 at 13:01

ocram

19,898
5
76
77

votes

3 answers

Connection between Fisher metric and the relative entropy

Can someone prove the following connection between Fisher information metric and the relative entropy (or KL divergence) in a purely mathematical rigorous way? $$D( p(\cdot , a+da) \parallel p(\cdot,a) ) =\frac{1}{2} g_{i,j} \, da^i \, da^j + (O(…

mathematical-statistics kullback-leibler fisher-information

asked Mar 02 '13 at 12:06

Kumara

votes

4 answers

Kullback-Leibler divergence WITHOUT information theory

After much trawling of Cross Validated, I still don't feel like I'm any closer to understanding KL divergence outside of the realm of information theory. It's rather odd as somebody with a Math background to find it much easier to understand the…

inference entropy information-theory kullback-leibler compression

asked Nov 16 '16 at 12:22

gazza89

1,734
1
9
17

votes

2 answers

What is the relationship between the GINI score and the log-likelihood ratio

I am studying classification and regression trees, and one of the measures for the split location is the GINI score. Now I am used to determining best split location when the log of the likelihood ratio of the same data between two distributions…

cart likelihood-ratio information-theory kullback-leibler gini

asked Apr 23 '14 at 15:43

EngrStudent

8,232
2
29
82

votes

1 answer

Deriving the KL divergence loss for VAEs

In a VAE, the encoder learns to output two vectors: $$\mathbf{\mu} \in\ \mathbb{R}^{z}$$ $$\mathbf{\sigma} \in\ \mathbb{R}^{z}$$ which are the mean and variances for the latent vector $\mathbf{z}$, the latent vector $\mathbf{z}$ is then calculated…

kullback-leibler autoencoders variational-bayes

asked Dec 14 '17 at 01:55

YellowPillow

1,031
2
9
16

2 3

…

30 31 Next