Kullback–Leibler divergence when one measure is a sum of diracs

Question

In the book "Deep Learning" of Goodfellow, Bengio and Courville, section 5.5 of maximum likelihood estimation they explain a relation between the maximization of likelihood and minimization of the K-L divergence.

My question is on the formal construction there.

The divergence $KL(p,q)$ between two arbitrary probability measures is possible only if $p$ is absolutely continuous with respect to $q$. See Kullback–Leibler divergence.

In the book they have an abstract probability measure characterizing the underlying model $p_{data}$, which I assume is considered to be absolutely continuous with the Lebesgue measure on some Euclidean space $\mathbb{R}^n$, then $p_{data}$ is actually the density function of that probability measure.

A sample of $m$ points from $p_{model}$ is generated: $\{x_1, x_2, x_3, \dots, x_m\}$.

From 5.58 to 5.59 they convert a quantity of the form $$\frac{1}{m} \sum_{i=1}^{m} f(x_i)$$ to $$E_{x\sim \hat{p}_{data}} \big[f(x) \big]$$ Which means that the "empirical distribution" $\hat{p}_{data}$ is defined by $$\hat{p}_{data} = \frac{1}{m} \sum_{i=1}^{m} \delta_{x_i}$$

The last measure is not absolutely continuous with respect to $p_{data}$, how are they able to compute the KL-divergence there? or what am I missing?

Beginning of https://en.wikipedia.org/wiki/Kullback%E2%80%93Leibler_divergence#Definition — Mark L. Stone, Jun 07 '18 at 22:29
@MarkL.Stone: I think the issue here is that $p_{data}$ is assumed to be continuous. — Alex R., Jun 07 '18 at 22:36
@MarkL.Stone what do you mean? the first definition there is used when both measures are discrete, but only one is discrete in this case. — Jorge E. Cardona, Jun 08 '18 at 21:12
@JorgeE.Cardona If my answer has helped, could you mark it as 'accepted'. — CATALUNA84, Jun 26 '20 at 07:45
@CATALUNA84 Thank you for your answer. My question was not on implementation, but on the fact that to compute KL(p,q) one needs p to be absolutely continuous with respect to q, but that is not the case in that section. — Jorge E. Cardona, Jun 29 '20 at 14:55
What's your source assumption for p(data) 'absolutely continuous'? Yes, it's well defined from a mathematical standpoint and requires P to be absolutely continuous, but it can be interpreted differently for certain cases. Please refer to https://stats.stackexchange.com/q/69125/269109 for a KL divergence between different continuities. — CATALUNA84, Jun 30 '20 at 13:14
@CATALUNA84 The referred answer confirms the impossibility of defining $KL(p,q)$ when $p$ is not absolutely continuous with respect to $q$, it does not present a suitable definition when $p$ is not abs. cont wrt $q$. — Jorge E. Cardona, Jun 30 '20 at 20:14

CATALUNA84 · Answer 1 · 2020-02-25T15:03:29.207

1

I am giving an implementation of the above theory here @https://haphazardmethods.wordpress.com/2017/06/29/chapter-3-kullback-leibler-divergence/, so that the OP is better able to grasp the idea behind it and implement it in real life.

They take a good sample range to explain the concepts and make a graph similar to the book's Figure 3.6

What they are doing is calculating a min KL divergence, between two functions if you want to send a piece of encoded information or train a neural network with KL for variational autoencoders, multiclass classification scenarios, or replacing least-square minimizations.

Refer to this post for more info. and comparisons between similar entropy methods/least-squares/etc @Intuition on the Kullback-Leibler (KL) Divergence

edited Feb 25 '20 at 15:03

answered Feb 25 '20 at 14:16

CATALUNA84

498
1
4
11

Hi and welcome to CrossValidated. Can you please summarize the content of your links? We are here to build a platform which is independent of any links being possibly broken in the future. – Ferdi Feb 25 '20 at 14:40
1

Being an engineer, I am more focused on the program. Should I give a link to the Colab repo here? – CATALUNA84 Feb 25 '20 at 15:05
Yes. That is an awesome idea. Maybe you can additionally summarize in one or two sentences what you are doing in the Colab repo. – Ferdi Feb 26 '20 at 13:26
GitHub's repos [link](https://github.com/mayankbhaskar007/DeepLearningBook/blob/master/kullback_leibler_divergence(Chapter_3).ipynb) for the code snippet to KL Divergence from scratch... Let me know if anyone needs a walkthrough :) – CATALUNA84 Feb 27 '20 at 08:24

Kullback–Leibler divergence when one measure is a sum of diracs

1 Answers1

Linked