5

I have a Dirichlet distribution from which I'm maximizing $\alpha$ by calculating the log likelihood with the following equation

$p\left(s|\alpha\right) = \sum\limits_{i=1}^O\log\left(\frac{n_i!}{\prod\limits_{j=1}^kn_{ij}!}\frac{\Gamma\left(\alpha_0\right)}{\Gamma\left(\sum\limits_{j=1}^k\left(n_{ij}+a_j\right)\right)} \prod\limits_{j=1}^k \frac{\Gamma\left(n_{ij}+\alpha_j\right)}{\Gamma\left(\alpha_j\right)}\right)$

I have a python script to calculate the log likelihood. It works for small values, but when $n_{ij}$ gets high $\Gamma\left(\sum\limits_{j=1}^k\left(n_{ij}+a_j\right)\right)$ and $\Gamma\left(n_{ij}+\alpha_j\right)$ python runs into overflow error (the max for math.gamma is between 171 and 172).

Is there a way to change the likelihood function so that $\Gamma\left(\sum\limits_{j=1}^k\left(n_{ij}+a_j\right)\right)$ and $\Gamma\left(n_{ij}+\alpha_j\right)$ don't run into overflow errors. Can I just divide the values in $\Gamma$ by for example 100, if I then compare the likelihood between $p(s|\alpha)$ with different $\alpha$ values do the best $\alpha$ values still have the highest likelihood?

Glen_b
  • 257,508
  • 32
  • 553
  • 939
Niek
  • 287
  • 2
  • 9
  • 2
    A common solution is to use a function which computes the log of gamma. I'd be surprised if there weren't already one in Python. Gamma is not a linear function - you can't just scale the *arguments* to gamma by the same constant and hope it cancels out. – Glen_b Oct 07 '13 at 22:03
  • Looking up the digamma function might be your best bet: http://en.wikipedia.org/wiki/Digamma_function – bdeonovic Oct 07 '13 at 22:09
  • 2
    @Benjamin Could you explain how that solves the OP's problem? – Glen_b Oct 07 '13 at 22:10
  • @Glen_b python does have a log gamma function. I don't have the mathematical knowledge to know how changing the $\Gamma$ to log $\Gamma$ will affect the likelihood though. Can I just substitute them and still use the likelihood to maximimize $\alpha$? – Niek Oct 07 '13 at 22:11
  • log is a monotone increasing function so it should be fine. – bdeonovic Oct 07 '13 at 22:14
  • log of a product is sum of the log. Use log factorials too. – Henry Oct 08 '13 at 00:37
  • 3
    This question is asked in a different guise and answered at http://stats.stackexchange.com/questions/71030/performing-calculation-of-a-product-of-a-sequence. For complex calculations of this sort some caution concerning loss of precision is called for; that too is addressed in the other thread. – whuber Oct 08 '13 at 03:17

1 Answers1

6

You suggest in comments that you don't know how the log-gamma function will work with the likelihood.

You need some basic facts:

1) log of a product of terms is the sum of the logs ($\log(ab) = \log(a) + \log(b)$)

2) log of a reciprocal is the negative of the log ($\log(1/c) = -\log(c)$)

3) The relationship between factorials and Gamma functions ($x! = \Gamma(x+1)$)

Using those three facts you just take the log inside that product (making it a sum of log-terms via rule 1), and apply rules 1,2, and 3 repeatedly until you have nothing but sums and differences of log-gamma functions inside the summation.

(This is not really a statistical issue at all, but a mathematical one, and the problem being solved is a computational one.)

If Python also has a function for the log of a binomial coefficient, you may be able to shave some time off your calculation.

Glen_b
  • 257,508
  • 32
  • 553
  • 939