Why is forward Kullback Leibler Divergence mean seeking?

Question

I have a course on Information theory, in the which we talk about forward KLD in order to approximate pdfs. There is an example that's the same example as on this blog :

https://towardsdatascience.com/forward-and-reverse-kl-divergence-906625f1df06

($q_\theta(x)$ is the distribution we're computing, $p(x)$ is the true distribution)

In it the author writes :

In words, wherever p(x) has high probability, q(x) must also have high probability. This is mean-seeking behaviour, because q(x) must cover all the modes and regions of high probability in p(x), but q(x) is not penalized for having high probability masses where p(x) does not.

I simply do not understand the logic behind this.

Taking into account the presence of a log in

$arg_\theta $ $ max E_p(log(q_\theta(x)) = arg_\theta $ $ min $ $H(p, q(x))$ (cross entropy)

This necessarily means that when p(x) is large, well to maximise this we want a small q(x), that way the logarithm will be a very small negative number, which when multiplied by -1 will give a big positive number.

I'm trying to wrap my head around the intuition behind this optimisation and why it leads in practice to mean seeking behaviour.

"which inversed will give a big positive number" What do you mean by this? There is no inverse in the expression you have in your post. There is an inverse in the original Kullback-Leibler divergence formula, but we minimize Kullback-Leibler divergence. — jkpate, Jun 09 '21 at 11:12
Sorry, I meant : "which multiplied by -1 will give a big positive number" — gordon_freeman, Jun 09 '21 at 11:29
I also don't see a multiplication with -1 in your arg max problem — jkpate, Jun 09 '21 at 12:01
Oups again, I forgot to add the following equality, it's equivalent to minimizing cross entropy, who'se expression contains a multiplication by -1. — gordon_freeman, Jun 09 '21 at 12:11
Right, maximizing a function $f(x)$ is equivalent to minimizing its negation $-f(x)$ — jkpate, Jun 09 '21 at 12:13
Does my answer at https://stats.stackexchange.com/questions/465464/relationship-between-kl-divergence-and-entropy/465501#465501 help? — kjetil b halvorsen, Jun 09 '21 at 12:23

Why is forward Kullback Leibler Divergence mean seeking?

0 Answers0