2

We know KL-divergence is sometimes expressed like this:

KL-divergence

which shows it's capturing the deviation between the joint distribution of X and Y, and the product of marginals for X and Y. This suggests KL-divergence is simply the multiplication rule for independent events, reformulated in terms of entropy. In other words, if the joint fails to align with the product of marginals then it's expected the variables have some dependence structure.

We also see KL-divergence expressed like this:

KL-divergence as well

How does this second expression relate to the first? I cannot see where the joint is being calculated, or the product of marginals for that matter.

Cybernetic
  • 368
  • 2
  • 10
  • 1
    The first expression is the mutual information between $X$ and $Y$, which is only a special instance of a KL divergence. Not all KL divergences take the form of a mutual information. – πr8 Aug 24 '20 at 17:53
  • @πr8 But we should still be able to understand the first expression in terms of the second. – Cybernetic Aug 24 '20 at 17:55
  • 1
    Yes, the first expression is the KL divergence between the joint $P(x, y)$ and the product of the marginals $P(x) \cdot P(y)$. The first expression is a special case of the second. – πr8 Aug 24 '20 at 17:59
  • that is, you can write $I(X, Y) = \sum_{x, y} P (x, y) \log \frac{ P (x, y) }{ P(x) \cdot P(y) }.$ is this what you mean? – πr8 Aug 24 '20 at 18:00
  • @πr8 Yes this is what I am referring to. Thank you. Simply replacing the single probabilities in the general equation with the joint and product of marginals for the MI version (special case). What makes this move valid? In other words, why is it possible to treat both a joint and a product of marginals as though they were regular, single distributions? – Cybernetic Aug 24 '20 at 18:24
  • 1
    what do you mean by "treat both a joint and a product of marginals as though they were regular, single distributions"? they are both valid distributions over $(x, y)$ space, if that answers your question. – πr8 Aug 24 '20 at 18:39
  • It's amazing MI doesn't get taught as the multiplication rule for independent events, when that's literally all it is. – Cybernetic Aug 24 '20 at 19:09
  • how do you mean that? in what sense is it the same? – πr8 Aug 24 '20 at 20:44
  • @πr8 MI is simply showing the deviation between a joint distribution and product of marginals. Thats THE way to determine dependence between variables. – Cybernetic Aug 25 '20 at 00:28
  • right - I agree that MI is a way of testing dependence between variables. when you say that (paraphrasing) "MI is just the multiplication rule for independent events", I don't understand what claim is being made. – πr8 Aug 25 '20 at 13:35
  • I am saying that pedagogically MI should be taught as a reframing of the well-known, already learned, multiplication rule for independence. It’s a natural extension of an age-old probability problem, not some new approach to understanding dependence, as it is taught. – Cybernetic Aug 25 '20 at 14:00
  • I am not sure of what you mean, but I also don't know if I have anything to add which will help to answer your question (if there is anything left to answer). – πr8 Aug 25 '20 at 22:00
  • And I am not sure what you don’t understand so we’ll leave it at that. Unless someone else objects I will say it’s settled that MI is indeed the multiplication rule for independence recast in terms of entropy. – Cybernetic Aug 25 '20 at 22:12

0 Answers0