1

I have 2 models

$P \sim Ga(115,1329.914) \\ Q \sim Ga(140,650.6775)$

and I'm looking to calculate the K-L divergence of these 2.

$D_{KL}(P||Q) = \int_\infty ^\infty p(x) log \frac{p(x)}{q(x)}\,dx$

where $p(x)$ and $q(x)$ are the probability densities.

So for my Gamma models I get

$D_{KL}(P||Q) = \int_0 ^\infty f_1(x|a_1,d_1,p_1) log \frac{f_1(x|a_1,d_1,p_1)}{f_2(x|a_2,d_2,p_2)} \,dx$

The 3 parameter gamma distribution is given by

$f(x | a,d,p) = \frac{p}{a^d} \frac{x^{d-1}}{\Gamma(d/p)} exp\Big\{-(\frac{x}{a})^p \Big\}$

I just dont know how to apply this to my problem, as the values I have are just a standard $Ga(\alpha, \beta)$

A further point is that

$\int_0 ^\infty f_1(x|a_1,d_1,p_1) log \frac{f_1(x|a_1,d_1,p_1)}{f_2(x|a_2,d_2,p_2)} \,dx \\ = log \frac{p_1d_2^{d_2}\Gamma(d_2/p_2)}{p_2d_1^{d_1}\Gamma(d_1/p_1)} + \Bigg\{\frac{\psi(d_1/p_1)}{p_1} + log a_1\Bigg\}(d_1-d_2)+\frac{\Gamma(\frac{d_1+p_2}{p_1})}{\Gamma(\frac{d_1}{p_1})} (\frac{a_1}{a_2})^{p_2}-\frac{d_1}{p_1}$

Where $\psi(\cdot)$ is the digamma function

But once again I don't know how to apply this to my original problem as I don't know how to parameterize correctly.

I was wondering if there is an easier way which I am overlooking as this seems very complex.

Edit: Just on a side note, I know that

$D_{KL}(P||Q) \neq D_{KL}(Q||P)$

But are they both still as valid as each other for KL divergence??

Second Edit: From here Kullback–Leibler divergence between two gamma distributions

One of the answers says that KL divergence is a difference in integrals of the form ..... but how do you know that?

Edit:

KL = function(a,b,c,d){
  return(((a-c)/c)*b + log((lgamma(d)*(c^d))/(lgamma(b)*(a^b)))+(b-d)*(log(a)+digamma(b)))
}

KL(202,114186.3,195,119237.3)

Gives answer NaN when it should be close to 1?

user162934
  • 223
  • 3
  • 8
  • What are $a$, $d$ and $p$? – Carlos Campos Mar 26 '18 at 17:31
  • @CarlosCampos I've edited the question to make it more clear. However, if you look at my second edit. Could you possibly tell me how he know the KL divergence is a difference in integrals of the form given. If I know how to show that then I dont need to undestand the above question. – user162934 Mar 26 '18 at 17:36
  • I think the key to my question is that if I set $p=1$ then I get the gamma distribution that I want! – user162934 Mar 26 '18 at 17:36
  • 1
    You will find an answer on page 8 of http://www.mast.queensu.ca/~linder/pdf/GiAlLi13.pdf – kjetil b halvorsen Mar 26 '18 at 17:38
  • @kjetil b halvorsen Nice link ... all it's missing is a saddlepoint approximation. – Mark L. Stone Mar 26 '18 at 17:44
  • @Mark L. Stone Saddlepoint approximation? Why? – kjetil b halvorsen Mar 26 '18 at 17:45
  • I was being sincere about nice link. But mentioning saddlepoint approximation was meant to be a joke in honor of your seminal board postings in that obscure topic. – Mark L. Stone Mar 26 '18 at 17:52
  • @kjetilbhalvorsen May be a stupid question, but is there an online calculator to solve this lol. – user162934 Mar 26 '18 at 18:00
  • Not that I know about! But the referenced paper has a nice formula that you can program i a minute! – kjetil b halvorsen Mar 26 '18 at 18:05
  • I'm having the issue that my numbers are too large. I dont know how to get around this. The distributions I give at the start are the posteriors of my data, but I am getting an error when trying to compute the KL distance – user162934 Mar 26 '18 at 18:15
  • 1
    The online calculator is at https://www.wolframalpha.com/input/?i=Integrate%5BLog%5Bx%5D+x%5E2+Exp%5B-x%5D,+%7Bx,+0,+Infinity%7D%5D (for example). – whuber Mar 27 '18 at 12:38
  • @whuber I am getting a vlaue of -15000, would you suggest using log gamma then? If so, what implications does this have on my digamma function – user162934 Mar 27 '18 at 15:02
  • *Of course* you would use a log Gamma function. Gammas get so large so quickly that it's rarely useful to compute with them directly. Digamma doesn't have the same behavior: it is the derivative of the log Gamma as it is. – whuber Mar 27 '18 at 15:42
  • @whuber Ok cool, I'll make this adaptation and get back to you :) – user162934 Mar 27 '18 at 15:46

1 Answers1

1

The comments by @kjetil b halvorse address the theoretical formula you need. But as you desxcribe, you are having numerical difficulties. That should be resolved as follows:

Use what I generically call "log gamma" function, whatever it may be called in your computing environment. For instance, gammaln in MATLAB https://www.mathworks.com/help/matlab/ref/gammaln.html . In Wolfram , it's called LogGamma http://reference.wolfram.com/language/ref/LogGamma.html .

log gamma function computes log(gamma(x)), but does it in a better way than first computing gamma function, then taking log. This function should avoid overflow (or underflow) from excessively large gamma before taking log. The algebraic rearrangement to use this should be straightforward.

Edit: Given that you want to do the calculations in R per your comment: In R, use lgamma(x) for log(gamma(x)). That should solve your numerical problems. Technically, lgamma is the natural log of the absolute value of the gamma function, but I don't think you need to worry about the abs in your usage.

Edit 2: Here is the complete R function (I also re-wrote the powers in the log to avoid problems).

KL = function(a,b,c,d){
((a-c)/c)*b + lgamma(d) - lgamma(b) + d*log(c) - b*log(a) + (b-d)*(log(a) + digamma(b)) 
}

KL(202,114186.3,195,119237.3) = 9.6433, as computed by the MATLAB analog of this formula (gammaln instead of lgamma, and psi instead of digamma).

Mark L. Stone
  • 12,546
  • 1
  • 31
  • 51
  • Per @whuber 's comment, you can do the calculation at the Wolfram link he provided, using LogGamma . – Mark L. Stone Mar 27 '18 at 12:49
  • I am using R rather than MatLab or any other. So what your saying is to just take the log of all my gamma's? What about the Digamma function then? – user162934 Mar 27 '18 at 13:14
  • In R, use lgamma(x) for log(gamma(x))..That should solve your numerical problems. Technically, lgamma is the natural log of the absolute value of the gamma function, but I don't think you need to worry about the abs in your usage. digamma is the logarithmic derivative of the gamma function, and not what you want for this. – Mark L. Stone Mar 27 '18 at 23:05
  • For some reason the values are returning NaN, which I just dont get – user162934 Apr 02 '18 at 00:02
  • I'm going to edit the question with my R code, I was wondering if you could take a look. Because I'm getting an astronomical KL divergence for similar posteriors – user162934 Apr 02 '18 at 00:10
  • You're double taking log. lgamma is already log(gamma). So rewrite the term in question as the difference of two lgamma 's + ln(term involving thetas). – Mark L. Stone Apr 02 '18 at 00:52
  • Sorry, Im not quite following you. I am using the formula given just replacing gamma with lgamma – user162934 Apr 02 '18 at 01:23
  • No. lgamma(x) = ln(gamma(x)) . lgamma replace ln(gamma), not gamma. The point of using lgamma is to avoid ever computing gamma by itself. – Mark L. Stone Apr 02 '18 at 01:42
  • Yeah, but that's what my code at the bottom says? thats the formula – user162934 Apr 02 '18 at 14:43
  • No, you are taking log of lgamma. So you are doing log(log(gamma))), not log(gamma). There is no log(log(gamma)) in the formula, but there is in your code. – Mark L. Stone Apr 02 '18 at 15:19
  • If you look here (https://gyazo.com/976e9ecdb0a8b273c2cdab8b5c923c2a) the formula says to take the log of the fraction of gammas. So what you're saying is to get rid of this log value, and replace it with 2 lgamma for each gamma function respectively?? – user162934 Apr 02 '18 at 15:58
  • Rewrite as lgamma(d) - lgamma(b) + log(c^d)(a^b)) – Mark L. Stone Apr 02 '18 at 16:30
  • Let us [continue this discussion in chat](https://chat.stackexchange.com/rooms/75411/discussion-between-user162934-and-mark-l-stone). – user162934 Apr 02 '18 at 16:54
  • 1
    I think there may be some confusion concerning the meanings of `a,b,c,d`. Using the formula I derived at https://stats.stackexchange.com/a/11668/919 and (with the notation of that answer) setting $a=1/114186.3$, $b=202$, $c=1/119237.3$, and $d=195$, I obtain a value of $0.60753126\ldots$ rather than $9.6433.$ – whuber Apr 02 '18 at 20:28
  • @whuber Yes, thanks. It appears the O.P. has likely interchanged shape and scale parameters. As to whether there should be a reciprocal on the scale parameter, that is for the O.P to say, since O.P. provided parameters. I relied on O.P. correctly mapping parameters to formula in the link.http://www.mast.queensu.ca/~linder/pdf/GiAlLi13.pdf . – Mark L. Stone Apr 02 '18 at 20:43