2

In the book "Deep Learning" of Goodfellow, Bengio and Courville, section 5.5 of maximum likelihood estimation they explain a relation between the maximization of likelihood and minimization of the K-L divergence.

My question is on the formal construction there.

The divergence $KL(p,q)$ between two arbitrary probability measures is possible only if $p$ is absolutely continuous with respect to $q$. See Kullback–Leibler divergence.

In the book they have an abstract probability measure characterizing the underlying model $p_{data}$, which I assume is considered to be absolutely continuous with the Lebesgue measure on some Euclidean space $\mathbb{R}^n$, then $p_{data}$ is actually the density function of that probability measure.

A sample of $m$ points from $p_{model}$ is generated: $\{x_1, x_2, x_3, \dots, x_m\}$.

From 5.58 to 5.59 they convert a quantity of the form $$\frac{1}{m} \sum_{i=1}^{m} f(x_i)$$ to $$E_{x\sim \hat{p}_{data}} \big[f(x) \big]$$ Which means that the "empirical distribution" $\hat{p}_{data}$ is defined by $$\hat{p}_{data} = \frac{1}{m} \sum_{i=1}^{m} \delta_{x_i}$$

The last measure is not absolutely continuous with respect to $p_{data}$, how are they able to compute the KL-divergence there? or what am I missing?

  • Beginning of https://en.wikipedia.org/wiki/Kullback%E2%80%93Leibler_divergence#Definition – Mark L. Stone Jun 07 '18 at 22:29
  • 1
    @MarkL.Stone: I think the issue here is that $p_{data}$ is assumed to be continuous. – Alex R. Jun 07 '18 at 22:36
  • 1
    @MarkL.Stone what do you mean? the first definition there is used when both measures are discrete, but only one is discrete in this case. – Jorge E. Cardona Jun 08 '18 at 21:12
  • @JorgeE.Cardona If my answer has helped, could you mark it as 'accepted'. – CATALUNA84 Jun 26 '20 at 07:45
  • @CATALUNA84 Thank you for your answer. My question was not on implementation, but on the fact that to compute KL(p,q) one needs p to be absolutely continuous with respect to q, but that is not the case in that section. – Jorge E. Cardona Jun 29 '20 at 14:55
  • What's your source assumption for p(data) 'absolutely continuous'? Yes, it's well defined from a mathematical standpoint and requires P to be absolutely continuous, but it can be interpreted differently for certain cases. Please refer to https://stats.stackexchange.com/q/69125/269109 for a KL divergence between different continuities. – CATALUNA84 Jun 30 '20 at 13:14
  • @CATALUNA84 The referred answer confirms the impossibility of defining $KL(p,q)$ when $p$ is not absolutely continuous with respect to $q$, it does not present a suitable definition when $p$ is not abs. cont wrt $q$. – Jorge E. Cardona Jun 30 '20 at 20:14

1 Answers1

1

I am giving an implementation of the above theory here @https://haphazardmethods.wordpress.com/2017/06/29/chapter-3-kullback-leibler-divergence/, so that the OP is better able to grasp the idea behind it and implement it in real life.

They take a good sample range to explain the concepts and make a graph similar to the book's Figure 3.6

What they are doing is calculating a min KL divergence, between two functions if you want to send a piece of encoded information or train a neural network with KL for variational autoencoders, multiclass classification scenarios, or replacing least-square minimizations.

Refer to this post for more info. and comparisons between similar entropy methods/least-squares/etc @Intuition on the Kullback-Leibler (KL) Divergence

CATALUNA84
  • 498
  • 1
  • 4
  • 11
  • Hi and welcome to CrossValidated. Can you please summarize the content of your links? We are here to build a platform which is independent of any links being possibly broken in the future. – Ferdi Feb 25 '20 at 14:40
  • 1
    Being an engineer, I am more focused on the program. Should I give a link to the Colab repo here? – CATALUNA84 Feb 25 '20 at 15:05
  • Yes. That is an awesome idea. Maybe you can additionally summarize in one or two sentences what you are doing in the Colab repo. – Ferdi Feb 26 '20 at 13:26
  • GitHub's repos [link](https://github.com/mayankbhaskar007/DeepLearningBook/blob/master/kullback_leibler_divergence(Chapter_3).ipynb) for the code snippet to KL Divergence from scratch... Let me know if anyone needs a walkthrough :) – CATALUNA84 Feb 27 '20 at 08:24