4

I have three discrete probability distributions, A, B and C. They are all measuring P(X) under different circumstances. I suspect that A is more similar to B than it is to C. I know that I can compare the difference between distributions with KL divergence, but how can I test whether the difference between A-B is less than the difference between A-C?

kjetil b halvorsen
  • 63,378
  • 26
  • 142
  • 467
  • 2
    You have a tough problem, because statistical and mathematical theory will not decide the answer: you are assuming there is a relevant way to compare distributions so that "less than" has meaning. *What* meaning it might have is up to you to decide: that's not something we can tell you--although we can provide some guidance, if you would explain how you intend to interpret the result. – whuber Jun 12 '18 at 12:59
  • You seem to answer your own question: by comparing the two KL-divergences. Of course that means you're committing to defining "difference between" as "KL-divergence from". – Mees de Vries Jun 12 '18 at 13:01
  • 1
    Rather than comparing the distributions as a whole, can you not compare specific aspects of the disrubutions, captured by relevant quantiles or functions of quantiles? This might give you more insights into where the distributions differ (e.g., tails). – Isabella Ghement Jun 12 '18 at 13:09

1 Answers1

1

You already got some hints in comments, and a request for more information, which you didn't give us. Here are some thoughts:

Observations on three discrete variables, presumably defined on the same categories, can be represented as a contingency table. Then you do a correspondence analysis, see Interpreting 2D correspondence analysis plots. The results can be presented graphically, and the three variables can be compared with the so-called chisquare distance. A similar graphical analysis can probably be based on the KL divergence. To say more we need more context.

kjetil b halvorsen
  • 63,378
  • 26
  • 142
  • 467