4

Does maximizing the Jensen–Shannon divergence $D_{\mathrm{JS}}(P \parallel Q)$ maximize the Kullback–Leibler divergence $D_{\mathrm{KL}}(P \parallel Q)$? If so, I'd like to be able to show that it does.

I have managed to express $D_{\mathrm{JS}}(P \parallel Q)$ in terms of $D_{\mathrm{KL}}(P \parallel Q)$ as follows:

$$D_{\mathrm{JS}}(P \parallel Q) = \frac{1}{2}D_{\mathrm{KL}}(P \parallel Q) + \frac{1}{2}D_{\mathrm{KL}}(Q \parallel P) - \frac{1}{2}\mathbb{E}_P\Big[ \log \Big(1 + \frac{P}{Q}\Big) \Big] - \frac{1}{2}\mathbb{E}_Q\Big[ \log \Big(1 + \frac{Q}{P}\Big) \Big] + \log2$$

Does the linear relationship between the two in the above expression show that maximizing $D_{\mathrm{JS}}(P \parallel Q)$ also maximizes $D_{\mathrm{KL}}(P \parallel Q)$?

Arya McCarthy
  • 6,390
  • 1
  • 16
  • 47
seabass09
  • 41
  • 3
  • I'm curious how you got that expression of $D_{JS}$ in terms of $D_{KL}$. $D_{JS} = \frac{1}{2} D_{KL}(P||\frac{P+Q}{2}) + \frac{1}{2} D_{KL}(Q||\frac{P+Q}{2})$. We have $D_{KL}(P||\frac{P+Q}{2}) = \sum P \log (\frac{2P}{P+Q}) = \sum P \log 2 + \sum P \log (\frac{P}{P+Q}) = \log 2 - \sum P \log (1 + \frac{Q}{P}) = \log 2 - E_P (\log (1+\frac{Q}{P}))$ But maybe I missed something here? – kajsam Mar 04 '21 at 13:04
  • @kajsam $$D_{KL}\big(P \parallel \frac{1}{2}(P+Q)\big) = \mathbb{E}_P\Big[\log \frac{P}{\frac{1}{2}(P + Q)} \Big]$$ $$ = \mathbb{E}_P\Big[ \log P - \log(\frac{1}{2}(P + Q)) \Big]$$ $$ = \mathbb{E}_P\Big[ \log P - \log(P + Q) - \log\frac{1}{2} \Big]$$ $$ = \mathbb{E}_P\Big[\log P - \log(Q(\frac{P}{Q} + 1)) + \log 2\Big]$$ $$ = \mathbb{E}_P\Big[\log P - \log Q -\log(\frac{P}{Q} + 1) + \log 2 \Big]$$ $$ = \mathbb{E}_P\Big[\log \frac{P}{Q} -\log(\frac{P}{Q} + 1) + \log 2 \Big]$$ $$ = \mathbb{E}_P\Big[\log \frac{P}{Q}\Big] - \mathbb{E}_P\Big[\log(\frac{P}{Q} + 1)\Big] + \log 2 $$ – seabass09 Mar 04 '21 at 13:41
  • @kajsam I then did the same for the $D_{KL}\big(Q \parallel \frac{1}{2}(P+Q)\big)$ term. – seabass09 Mar 04 '21 at 13:46
  • Thank you, now I see. – kajsam Mar 04 '21 at 14:35

2 Answers2

2

I'm not sure whether the linear relationship plays a role, but I think you can show that if one attains its maximum then so does the other by the following reasoning (it is not a complete proof, I guess):

Both $D_{KL}$ and $D_{JS}$ are $f$-divergences, so we know that they attain their maximum when $P$ and $Q$ are orthogonal. Because $\max(D_{KL}) = \infty$, $D_{KL}$ attains its maximum only if $P \perp Q$. The question remains whether $D_{JS}$ can attain its maximum $\log 2$ without $P \perp Q$.

Because $D_{KL}(P|Q)$, $D_{KL}(Q|P)$, $E_P[\log(1 +\frac{P}{Q})]$, and $E_Q[\log(1 +\frac{Q}{P})]$ are all positive, we can show that

$$ D_{KL}(P|Q) + D_{KL}(Q|P) \neq E_P[\log(1 +\frac{P}{Q})] + E_Q[\log(1 +\frac{Q}{P})] $$ by observing that $$ \sum P \log \frac{P}{Q} + \sum Q \log \frac{Q}{P} < \sum P \log \frac{P+Q}{Q} + \sum Q \log \frac{P+Q}{P} $$ because $$ \sum P \log \frac{P}{Q} < \sum P \log \frac{P+Q}{Q}$$ and are only equal for the limit when $P \perp Q$. And we conclude that if $D_{JS} = \log 2$, then $D_{KL} = \infty$.

kajsam
  • 531
  • 1
  • 7
1

Comment:

$D_{JS}(p \| q) =\frac{1}{2} D_{KL}(p \| m)+\frac{1}{2} D_{KL}(q \| m)$

$m=\frac{1}{2}(p+q)$

For the case where you assume Normal distribution for both $p \sim \mathcal N(\mu_1, \sigma_1)$ and $q \sim \mathcal N(\mu_2, \sigma_2)$, and $m \sim \mathcal N(\frac{\mu_1+\mu_2}{2}, \frac{\sigma_1 +\sigma_2}{2})$

You can express $D_{JS}$ in terms of $D_{KL}$

$D_{KL}(p \| q) = \log \frac{\sigma_{2}}{\sigma_{1}}+\frac{\sigma_{1}^{2}+\left(\mu_{1}-\mu_{2}\right)^{2}}{2 \sigma_{2}^{2}}-\frac{1}{2} $

and then it may be very easy to check.

Easy Points
  • 294
  • 1
  • 9