5

I have read that, in the context of matrix factorization, performing maximum likelihood estimation under the assumption that the entries are Poisson generated is equivalent to minimizing the generalized Kullback-Leibler divergence between the matrix and the product of its factors.

Does anyone know how to show this?

(for an example of this claim, see, e.g. page 2 of Lee and Seung's paper on NMF in Nature.)

benjaminwilson
  • 273
  • 2
  • 8

2 Answers2

4

I worked it out in the end and I'll link to it here, in case someone else is interested. I wrote it up here http://building-babylon.net/2015/02/17/maximum-likelihood-estimation-for-non-negative-matrix-factorisation-and-the-generalised-kullback-leibler-divergence/

benjaminwilson
  • 273
  • 2
  • 8
3

We want to proof that \begin{align} argmin_{W,H} \qquad D_{KL}(\boldsymbol{V} | \boldsymbol {WH}) \quad = \quad argmax_{W,H} \qquad p(\boldsymbol{V} | \boldsymbol{W}\boldsymbol{H}) \end{align} under a Poisson distribution.

The KL divergence (actually the I-divergence) is defined as \begin{align} D_{KL}(\boldsymbol{V} | \boldsymbol{W}\boldsymbol{H}) = \sum_f \sum_n \left[ v_{fn} \ln \frac{v_{fn}}{\sum_k w_{fk}h_{kn}} + \sum_k w_{fk}h_{kn} - v_{fn} \right] \end{align} And the likelihood can be expressed in terms of the KL divergence: \begin{align} \ln p(\boldsymbol{V} | \boldsymbol{W}\boldsymbol{H}) &= \sum_{f,n} \ln \left[ \exp\left\lbrace -\sum_k w_{fk}h_{kn}\right\rbrace \frac{(\sum w_{fk}h_{kn})^{v_{fn}}}{v_{fn}!} \right] \\ &= \sum_{f,n} \left[ {v_{fn}} \ln\sum w_{fk}h_{kn} -\sum_k w_{fk}h_{kn} - \ln v_{fn}! \right] \\ &= \sum_{f,n} \left[ {v_{fn}} \ln\frac{\sum w_{fk}h_{kn}}{v_{fn}} -\sum_k w_{fk}h_{kn} - \ln v_{fn}! +v_{fn}\ln v_{fn} \right] \\ &= -D_{KL}(\boldsymbol{V} | \boldsymbol{W}\boldsymbol{H}) - v_{fn} + \left[ v_{fn}\ln v_{fn} - \ln v_{fn}! \right] \end{align} Therefore, both the likelihood and the KL divergence have the same optimum with respect to $\boldsymbol{W}, \boldsymbol{H}$.

alberto
  • 2,646
  • 16
  • 36