weights versus shifted log-weights

Question

I'm having a hard time checking an equality in https://arxiv.org/pdf/1511.01707.pdf. It is unnumbered, immediately before (24), on page 13. Any help would be appreciated. Here is some simplified notation and a simplified question.

Let

$v^{(i)} > 0$ be the unnormalized weights
$v_{\text{max}} = \max_i\{\log v^{(i)}\}$
$\tilde{v}^i = \log v^{(i)} - v_{\text{max}}$ the shifted log-weights
$i=1,\ldots,N$

Usually we want to calculate $\frac{1}{N}\sum_i v^{(i)}$. But they're saying it's more numerically stable to calculate $$ \log \left[ \frac{1}{N}\sum_i v^{(i)} \right] = v_{\text{max}} + \left[\sum_i \tilde{v}^i \right]- \log N. $$ Problem is I'm not getting that.

\begin{align*} \log \left[ \frac{1}{N}\sum_i v^{(i)} \right] &= - \log N + \log\left[\sum_i v^{(i)} \right] \\ &= - \log N + \log\left[\sum_i \exp \log v^{(i)} \right] \\ &= - \log N + \log\left[\sum_i \exp \left\{\log v^{(i)} + v_{\text{max}} - v_{\text{max}} \right\}\right] \\ &= - \log N + \log\left[\sum_i \exp \left\{ \tilde{v}^{(i)} + v_{\text{max}} \right\}\right] \\ &= - \log N + \log\left[\exp v_{\text{max}} \sum_i \exp \tilde{v}^{(i)} \right] \\ &= - \log N + v_{\text{max}} + \log\left[ \sum_i \exp \tilde{v}^{(i)} \right] \\ &\neq - \log N + v_{\text{max}} + \log\left[ \exp \sum_i \tilde{v}^{(i)} \right] \tag{?}\\ &= v_{\text{max}} + \left[\sum_i \tilde{v}^i \right]- \log N. \end{align*}

What am I missing here?
Why is it more numerically stable? Why are floating points numbers better approximations to real numbers when the real numbers are not super small?

@JarleTufto this looks like a pretty good answer to me. Feel free to make it official. Also, I confirmed that it is a typo with one of the authors. — Taylor, Aug 29 '17 at 12:49

Jarle Tufto · Accepted Answer · 2017-08-29T20:52:18.177

1

If what is needed is $\log(\frac1N\sum v^{(i)})$ and computing each $v^{(i)}$ would result in numerical underflow ($v^{(i)}$ becomes numerically equal to zero) a trick like this is needed. Your expression $- \log N + v_{\text{max}} + \log\left[ \sum_i \exp \tilde{v}^{(i)} \right]$ would then solve the problem because $\exp \tilde v^{(i)}$ will not underflow (and the expression given by the authors indeed looks wrong). But if these numbers are sufficiently far from underflowing (greater than about $2^{-1022 + 52}$) I agree with you that nothing would be gained.

Apparently, the above numerical approach is known as the log-sum-exp trick for log-domain calculations, see https://en.wikipedia.org/wiki/LogSumExp and What to do when your likelihood function has a double product with small values near zero - log transform doesn't work?

edited Aug 29 '17 at 20:52

answered Aug 29 '17 at 14:06

Jarle Tufto

7,989
1
20
36

ah okay, now that I know what I'm looking for, it makes more sense to exponentiate $\tilde{v}^{(i)}$. Couldn't we also run into problems if all $\{ v^{(i)}\}$, including the maximum, are too close to zero? Then the $\tilde{v}^{(i)}$ would still be too close to zero, only negative this time. – Taylor Aug 29 '17 at 19:04
@Taylor I haven't looked at the paper on arxiv but I guess the idea is that that this trick applies if you can work with everything in terms of logs. So then you don't run into problems with underflow. I added a link to a wikipedia page describing the trick in my answer. – Jarle Tufto Aug 29 '17 at 20:47

weights versus shifted log-weights

1 Answers1