2

I'm trying to follow the derivation for the MLE parameters of the gamma distribution in [1].

The standard approach is to derive an expression for the log likelihood, differentiate with respect to each parameter of the distribution, set this equal to zero and solve. In the case of a gamma distribution the expression for the derivative w.r.t. $a$ is a combination of di-gamma functions of logarithms so finding an analytic expression for the parameter is not possible.

The expression for the log-likelihood is [1]:

$\log p(D, a, b) = n(a - 1)\overline{\log x} - n\log\Gamma(a) - na\log\overline x+na\log a - na$

The reference argues that we have a lower bound on $a\log a$ by Taylor expanding to first order around some point $a_0$, which is clearly true since it is a convex function. Therefore we have a lower bound on $\log p(D, a, b)$.

Differentiating and setting equal to zero gives:

$\Psi(a) = \overline{\log x} - \log\overline x + \log a$,

where $\Psi$ is the di-gamma function.

The Question: At this point Minka argues that evaluating the RHS of this at some parameter and the "inverting the $\Psi$ function" to get the updated parameter is guaranteed to converge on the true value for $a_{MLE}$. This is not at all clear to me, could someone explain this argument in more detail?

[1] https://tminka.github.io/papers/minka-gamma.pdf

JMzance
  • 274
  • 2
  • 7

1 Answers1

1

The argument is that you are repeatedly performing two steps:

  1. Replace the objective with a lower bound that makes contact at the current point.
  2. Go to the unique maximum of the lower bound.

This algorithm can only stop at a stationary point of the objective. For a gamma distribution, the only stationary point is at $a_{MLE}$. Note this is the same approach used by the Expectation Maximization (EM) algorithm. For a nice picture, see Figure 1 of Expectation-Maximization as lower bound maximization.

Tom Minka
  • 6,610
  • 1
  • 22
  • 33