16

Choosing to parameterize the gamma distribution $\Gamma(b,c)$ by the pdf $g(x;b,c) = \frac{1}{\Gamma(c)}\frac{x^{c-1}}{b^c}e^{-x/b}$ The Kullback-Leibler divergence between $\Gamma(b_q,c_q)$ and $\Gamma(b_p,c_p)$ is given by [1] as

\begin{align} KL_{Ga}(b_q,c_q;b_p,c_p) &= (c_q-1)\Psi(c_q) - \log b_q - c_q - \log\Gamma(c_q) + \log\Gamma(c_p)\\ &\qquad+ c_p\log b_p - (c_p-1)(\Psi(c_q) + \log b_q) + \frac{b_qc_q}{b_p} \end{align}

I'm guessing that $\Psi(x):= \Gamma'(x)/\Gamma(x)$ is the digamma function.

This is given with no derivation. I cannot find any reference that does derive this. Any help? A good reference would be sufficient. The difficult part is integrating $\log x$ against a gamma pdf.

[1] W.D. Penny, KL-Divergences of Normal, Gamma, Dirichlet, and Wishart densities, Available at: www.fil.ion.ucl.ac.uk/~wpenny/publications/densities.ps

Neil G
  • 13,633
  • 3
  • 41
  • 84
Ian Langmore
  • 335
  • 2
  • 7
  • 2
    Taking the derivative of the the pdf with respect to $c$ introduces the factor of $log(x)$ you are looking for: that's why digamma shows up. – whuber Jun 07 '11 at 00:11
  • I am looking for a solution to the same problem and find this [one](https://www.researchgate.net/publication/259899864_Computing_the_Kullback-Leibler_Divergence_between_two_Generalized_Gamma_Distributions) is useful. – JYY Oct 04 '17 at 23:36
  • If you happen across Pierre Baldi and Laurent Itti (2010) “Of bits and wows: A Bayesian theory of surprise with applications to attention” Neural Networks 23:649-666, you will find Equation 73 gives a KL divergence between two gamma pdfs. Take care, though, it looks like the formula is mis-printed. – Mr Clarinet Aug 30 '12 at 19:54

2 Answers2

15

The KL divergence is a difference of integrals of the form

$$\begin{aligned} I(a,b,c,d)&=\int_0^{\infty} \log\left(\frac{e^{-x/a}x^{b-1}}{a^b\Gamma(b)}\right) \frac{e^{-x/c}x^{d-1}}{c^d \Gamma(d)}\, \mathrm dx \\ &=-\frac{1}{a}\int_0^\infty \frac{x^d e^{-x/c}}{c^d\Gamma(d)}\, \mathrm dx - \log(a^b\Gamma(b))\int_0^\infty \frac{e^{-x/c}x^{d-1}}{c^d\Gamma(d)}\, \mathrm dx\\ &\quad+ (b-1)\int_0^\infty \log(x) \frac{e^{-x/c}x^{d-1}}{c^d\Gamma(d)}\, \mathrm dx\\ &=-\frac{cd}{a} - \log(a^b\Gamma(b)) + (b-1)\int_0^\infty \log(x) \frac{e^{-x/c}x^{d-1}}{c^d\Gamma(d)}\,\mathrm dx \end{aligned}$$

We just have to deal with the right hand integral, which is obtained by observing

$$\eqalign{ \frac{\partial}{\partial d}\Gamma(d) =& \frac{\partial}{\partial d}\int_0^{\infty}e^{-x/c}\frac{x^{d-1}}{c^d}\, \mathrm dx\\ =& \frac{\partial}{\partial d} \int_0^\infty e^{-x/c} \frac{(x/c)^{d-1}}{c}\, \mathrm dx\\ =&\int_0^\infty e^{-x/c}\frac{x^{d-1}}{c^d} \log\frac{x}{c} \, \mathrm dx\\ =&\int_0^{\infty}\log(x)e^{-x/c}\frac{x^{d-1}}{c^d}\, \mathrm dx - \log(c)\Gamma(d). }$$

Whence

$$\frac{b-1}{\Gamma(d)}\int_0^{\infty} \log(x)e^{-x/c}(x/c)^{d-1}\, \mathrm dx = (b-1)\frac{\Gamma'(d)}{\Gamma(d)} + (b-1)\log(c).$$

Plugging into the preceding yields

$$I(a,b,c,d)=\frac{-cd}{a} -\log(a^b\Gamma(b))+(b-1)\frac{\Gamma'(d)}{\Gamma(d)} + (b-1)\log(c).$$

The KL divergence between $\Gamma(c,d)$ and $\Gamma(a,b)$ equals $I(c,d,c,d) - I(a,b,c,d)$, which is straightforward to assemble.


Implementation Details

Gamma functions grow rapidly, so to avoid overflow don't compute Gamma and take its logarithm: instead use the log-Gamma function that will be found in any statistical computing platform (including Excel, for that matter).

The ratio $\Gamma^\prime(d)/\Gamma(d)$ is the logarithmic derivative of $\Gamma,$ generally called $\psi,$ the digamma function. If it's not available to you, there are relatively simple ways to approximate it, as described in the Wikipedia article.

Here, to illustrate, is a direct R implementation of the formula in terms of $I$. This does not exploit an opportunity to simplify the result algebraically, which would make it a little more efficient (by eliminating a redundant calculation of $\psi$).

#
# `b` and `d` are Gamma shape parameters and
# `a` and `c` are scale parameters.
# (All, therefore, must be positive.)
#
KL.gamma <- function(a,b,c,d) {
  i <- function(a,b,c,d)
    - c * d / a - b * log(a) - lgamma(b) + (b-1)*(psigamma(d) + log(c))
  i(c,d,c,d) - i(a,b,c,d)
}
print(KL.gamma(1/114186.3, 202, 1/119237.3, 195), digits=12)
whuber
  • 281,159
  • 54
  • 637
  • 1,101
11

The Gamma distribution is in the exponential family because its density can be expressed as:

\begin{align} \newcommand{\mbx}{\mathbf{x}} \newcommand{\btheta}{\boldsymbol{\theta}} f(\mbx \mid \btheta) &= \exp\bigl(\eta(\btheta) \cdot T(\mbx) - g(\btheta) + h(\mbx)\bigr) \end{align}

Looking at the Gamma density function, its log-normalizer is $$g(\btheta) = \log(\Gamma(c)) + c\log(b)$$ with natural parameters $$\btheta = \left[\begin{matrix}c-1\\-\frac1 b\end{matrix}\right]$$

All distributions in the exponential family have KL divergence:

\begin{align} KL(q; p) &= g(\btheta_p) - g(\btheta_q) - (\btheta_p-\btheta_q) \cdot \nabla g(\btheta_q). \end{align}

There's a really nice proof of that in:

Frank Nielsen, École Polytechnique, and Richard Nock, Entropies and cross-entropies of exponential families.

Neil G
  • 13,633
  • 3
  • 41
  • 84
  • Didn't know this. Just a quick question - the $g(.)$ function, does it have to be the same for $\theta_p$ as for $\theta_q$? So for example, would the above formula be valid for KL divergence of normal pdf from gamma pdf? – probabilityislogic Dec 21 '11 at 12:24
  • 1
    Yes, this formula is for two distributions in the same exponential family. – Neil G Dec 21 '11 at 12:27