3

Is limiting density of discrete points (LDDP), which is a corrected version of differential entropy, equivalent to the negative KL-divergence (or relative entropy) between a density function $m(x)$ and a probability distribution $p(x)$?

What are the maths of this equivalence, and the conditions that the density and probability functions have to meet? First of all, i thought densities are probability functions.

develarist
  • 3,009
  • 8
  • 31

1 Answers1

0

This is an old question, but I just came across it and thought I'd give it a shot for any future visitors to the post.


First, some intuition:

As you mentioned, LDDP is a "corrected" version of Shannon's proposal for a continuous version of entropy. LDDP is more appropriate than Shannon's differential entropy because LDDP is the limit of discrete entropy as we approximate a continuous distribution with increasingly dense discrete distributions.

To formalize the above, consider a set $X_N = {x_i}$, where $|X_N| = N$. If we continue increasing $N$, our points $x_i$ "fill" some space. When $N \rightarrow \infty$, $X_N$ approaches some continuous space.

LDDP is given by: $$H(X) = -\int p(x) log \frac{p(x)}{m(x)} dx $$

where $$\lim_{N \rightarrow \infty}\{\textrm{number of } x_i \textrm{ in } (a,b)\} = \int_a^bm(x)$$

So $m$ is just the density of the space that $X_N$ approaches.

KL-Divergence is: $$D_{KL}(p(x)||q(x)) = \int p(x)log \frac{p(x)}{q(x)} dx$$

So clearly, $H(X) = -D_{KL}(p(x)||m(x))$.

KL-Divergence (roughly) measures how different the two distributions are. So $H(X)$ is measuring how similar $p$ is to $m$. For many familiar spaces, like $[0,1]^d$, we can sample the space continuously and have $m(x) = \textit{U}([0,1]^d)$.

Because uniform distributions maximize entropy, it is natural that the continuous version of entropy would give the KL-divergence between the probability distribution in question and a uniform distribution over the same support.

References: See page 201-202 here

Josh Bone
  • 123
  • 7