How do I calculate KL-divergence between two multidimensional distributions?

Question

Each distribution is represented with an array of arrays with PMF values.

UPD 1: I have $P=(p_1, ... , p_n)$ where $P$ is a distribution of distributions and $p_i=(p_i^1, ..., p_i^m)$. My task is to compute $D_{KL}(P, Q)$.

UPD 2: Each $p_i$ is PMF and $\sum_j p_i^j=1$ for each i.

https://stats.stackexchange.com/questions/211175/kullback-leibler-divergence/248657#248657 — kjetil b halvorsen, Apr 21 '17 at 17:41
Is each $p_i$ a PMF? i.e. $\sum_jp_i^j=1$? Or is it only $P$ that is a PMF? i.e. $\sum_i\sum_jp_i^j=1$? Note that KL divergence is an expectation, so it only makes sense on "entire" PMF's (i.e. sum is 1). If you already have PMFs (vs. P*D*Fs) then you can just sum bin-probabilities (i.e. the multi-dimensional part would only comes in to convert from density to mass, via bin volume). — GeoMatt22, Apr 21 '17 at 22:52
@GeoMatt22 Yes, each $p_i$ is PMF and $\sum_j p_i^j=1$ for each i. — Anton Karazeev, Apr 22 '17 at 04:13
In what sense are the distributions "multidimensional"? What is $Q$? — Juho Kokkala, Apr 24 '17 at 16:50

score 0 · Answer 1 · answered Apr 21 '17 at 18:53

0

The KL-divergence does not depend on the dimensionality of the distribution - since a pmf must always be one-dimensional. (ie, what would it mean if $P(X = k)$ was a vector?)

What I mean is, the integral/summation in KL-divergence is with respect to $\mathbf{x}$, not $\theta$. For two distributions $p(\mathbf{x})$ and $q(\mathbf{x})$, you can write:

$$D_{KL}(p|q) = \int_\mathcal{X} p(\mathbf{x})\log\frac{p(\mathbf{x})}{q(\mathbf{x})}d\mathbf{x}$$

answered Apr 21 '17 at 18:53

Tim Atreides

708
3
6

3

He is talking about KL between multivariate distributions. The question is legitimate. Your answer on the other hand is not. – Cagdas Ozgenc Apr 21 '17 at 19:01
1

I don't see any issue with his answer. The formula he referenced uses multivariate distributions. We know this because there are bold letters in them. – jjet Apr 21 '17 at 19:14
Yes, a multivariate distribution would imply the variable of interest, $\mathbf{x}$, is a vector. Althought I would appreciate OP clarifying his question a bit, if I'm misinterpreting. – Tim Atreides Apr 21 '17 at 19:15
1

@TimAtreides actually distribution doesn't have to "be always one-dimensional". What if I have some machine learning task where value of each feature for given object has its own distribution as this value is a random variable? How should I compute KL-divergence between two such objects? – Anton Karazeev Apr 21 '17 at 20:03
If each feature has its own distribution, then it still has its own one-dimensional probabilities, yes? For each feature, you'd have to find its empirical distribution. It sounds like you're dealing with discrete data, so these would be empirical pmfs. You can thus find the KL divergence between these empricial pmfs. – Tim Atreides Apr 21 '17 at 20:08
@TimAtreides I'm agree with you. My task is to compute KL divergence between empirical pmfs. But I have an array of such empirical pmfs for one object and another array of different pmfs for the second object – Anton Karazeev Apr 21 '17 at 20:12
1

Oh, dude! Your problem is much easier, then. Just let $\mathbf{p} = (p_1,...,p_n)$ and $\mathbf{q} = (q_1,...,q_n)$ and compute the following: $$D_{KL} = \sum_i p_i \log\frac{p_i}{q_i}$$ Which I _believe_ is the correct way to compute the empirical KL divergence. – Tim Atreides Apr 21 '17 at 20:47
@TimAtreides You are absolutely right for one-dimensional distributions. But I have $P=(p_1, ... , p_n)$ where $p_i=(p_i^1, ..., p_i^m)$ and my task is to compute $D_{KL}(P, Q)$. $P$ is a distribution of distributions. – Anton Karazeev Apr 21 '17 at 21:57
@CagdasOzgenc You are right. I've just updated my question – Anton Karazeev Apr 21 '17 at 22:32
Perhaps update your answer to include the discrete sum? (and maybe a comment on PMF vs. PDF, i.e. bin-Pr = p*dx, so logically can sum over "1D" bin-index $i$?) – GeoMatt22 Apr 21 '17 at 23:23
@TimAtreides you're wrong – LKM Jan 13 '21 at 18:31

How do I calculate KL-divergence between two multidimensional distributions?

1 Answers1

Linked