2

I'm having a hard time checking an equality in https://arxiv.org/pdf/1511.01707.pdf. It is unnumbered, immediately before (24), on page 13. Any help would be appreciated. Here is some simplified notation and a simplified question.

Let

  1. $v^{(i)} > 0$ be the unnormalized weights
  2. $v_{\text{max}} = \max_i\{\log v^{(i)}\}$
  3. $\tilde{v}^i = \log v^{(i)} - v_{\text{max}}$ the shifted log-weights
  4. $i=1,\ldots,N$

Usually we want to calculate $\frac{1}{N}\sum_i v^{(i)}$. But they're saying it's more numerically stable to calculate $$ \log \left[ \frac{1}{N}\sum_i v^{(i)} \right] = v_{\text{max}} + \left[\sum_i \tilde{v}^i \right]- \log N. $$ Problem is I'm not getting that.

\begin{align*} \log \left[ \frac{1}{N}\sum_i v^{(i)} \right] &= - \log N + \log\left[\sum_i v^{(i)} \right] \\ &= - \log N + \log\left[\sum_i \exp \log v^{(i)} \right] \\ &= - \log N + \log\left[\sum_i \exp \left\{\log v^{(i)} + v_{\text{max}} - v_{\text{max}} \right\}\right] \\ &= - \log N + \log\left[\sum_i \exp \left\{ \tilde{v}^{(i)} + v_{\text{max}} \right\}\right] \\ &= - \log N + \log\left[\exp v_{\text{max}} \sum_i \exp \tilde{v}^{(i)} \right] \\ &= - \log N + v_{\text{max}} + \log\left[ \sum_i \exp \tilde{v}^{(i)} \right] \\ &\neq - \log N + v_{\text{max}} + \log\left[ \exp \sum_i \tilde{v}^{(i)} \right] \tag{?}\\ &= v_{\text{max}} + \left[\sum_i \tilde{v}^i \right]- \log N. \end{align*}

  1. What am I missing here?
  2. Why is it more numerically stable? Why are floating points numbers better approximations to real numbers when the real numbers are not super small?
Taylor
  • 18,278
  • 2
  • 31
  • 66

1 Answers1

1

If what is needed is $\log(\frac1N\sum v^{(i)})$ and computing each $v^{(i)}$ would result in numerical underflow ($v^{(i)}$ becomes numerically equal to zero) a trick like this is needed. Your expression $- \log N + v_{\text{max}} + \log\left[ \sum_i \exp \tilde{v}^{(i)} \right]$ would then solve the problem because $\exp \tilde v^{(i)}$ will not underflow (and the expression given by the authors indeed looks wrong). But if these numbers are sufficiently far from underflowing (greater than about $2^{-1022 + 52}$) I agree with you that nothing would be gained.

Apparently, the above numerical approach is known as the log-sum-exp trick for log-domain calculations, see https://en.wikipedia.org/wiki/LogSumExp and What to do when your likelihood function has a double product with small values near zero - log transform doesn't work?

Jarle Tufto
  • 7,989
  • 1
  • 20
  • 36
  • ah okay, now that I know what I'm looking for, it makes more sense to exponentiate $\tilde{v}^{(i)}$. Couldn't we also run into problems if all $\{ v^{(i)}\}$, including the maximum, are too close to zero? Then the $\tilde{v}^{(i)}$ would still be too close to zero, only negative this time. – Taylor Aug 29 '17 at 19:04
  • @Taylor I haven't looked at the paper on arxiv but I guess the idea is that that this trick applies if you can work with everything in terms of logs. So then you don't run into problems with underflow. I added a link to a wikipedia page describing the trick in my answer. – Jarle Tufto Aug 29 '17 at 20:47