What is the use of the log of the sum of exponents in machine learning

Question

I want to understand why somebody would use log sum exponent trick. I am reading this blog.

But I don't really understand the first paragraph.

It says

Let's say we have an n-dimensional vector and want to calculate:

$$y=\sum_{i=1}^n e^{x_i}$$

But my question is why would you want to compute that value in the first place?

We have many questions relating to this. Some can be found with this search: https://stats.stackexchange.com/search?q=log+sum+exp (many of which explain the motivation relates to avoiding overflow or underflow), and others which don't contain the phrase log-sum-exp but which also discuss overflow and/or underflow — Glen_b, May 25 '18 at 14:03
A common use is to avoid underflow problems when you multiply numbers <<1 (such as probabilities), as explained here: http://wittawat.com/posts/log-sum_exp_underflow.html But @Glen_b is right, there are many similar questions so you could do a bit of searching. — DimP, May 25 '18 at 14:15
Since all positive numbers $y_i$ can uniquely be represented $y_i=e^{x_i}$ (where, of course, $x_i=\log y_i$), your question concerns why anyone would want to compute the logarithm of a sum of positive numbers. That sounds awfully broad! Do you have a specific concern or application in mind? — whuber, May 25 '18 at 15:42
where in machine learning is log sum exponent trick even used? — develarist, Nov 17 '20 at 00:21

Mark L. Stone · Answer 1 · 2018-05-26T19:40:12.880

The comments address why provide special treatment for the numerical evaluation of logsumexp, as it's known, but not why it arises.

First consider the special case $x_1 = 0, x_2 = x$. Then logsumexp = $\text{log}(1 + e^x)$, known as the softplus function, is a differentiable approximation to the rectifier max(0,x) . max(0,x) is not differentiable.

Now consider the general version, $\text{logsumexp}(x_1,..,x_n) = \text{log} (\Sigma_{i=1}^n e^{x_i})$. It is readily apparent that $\text{max}(x_1,...x_n) \le \text{logsumexp}(x_1,..,x_n) \le \text{max}(x_1,...x_n) + \text{log}(n)$. Therefore, logsumexp serves as a differentiable approximation to the max of several numbers. Note that max is not differentiable.

It also turns out that logsumexp is jointly convex in its arguments, although this may not be readily apparent. Moreover, it can be be expressed by epigraph formulation in terms of the product of n exponential cones, which are convex. This then facilitates representation of logsumexp as a high level building block which can be used in the formulation of convex optimization problems in conic form in such a way as to enable the used of specialized conic optimization solvers which can solve problems involving logsumexp more efficiently and robustly than general purpose solvers can, presuming that the rest of the problem also admits to a conic formulation. As an example, the CVX convex optimization package has the function log_sum_exp, just for this purpose.

There's a whole heck of a lot more which can be added, but this ought to get you started.

Hey Mark, thanks that is a brilliant response but in it you say it is readily apparent that logsumexp is lte max + log(n). I don't see how that is apparent at all. Would you be able to elaborate :) Thanks — Funzo, May 26 '18 at 18:15
$\text{logsumexp}(x_1,..,x_n) \le \text{logsumexp}$ with all $x_i$'s replaced by $max(x_1,..,x_n)$. The latter $= \text{log}(\Sigma_{i=1}^n e^{max(x_1,..,x_n)}) = \text{log}(n*e^{max(x_1,..,x_n)}) = max(x_1,..,x_n) + log(n)$. Also note that I stated inequalities, not equalties. logsumexp is sandwiched between $\text{max}(x_1,..,.x_n)$ and $\text{max}(x_1,..,.x_n) + \text{log}(n)$ — Mark L. Stone, May 26 '18 at 18:30
$\text{max}(x_1,...x_n) \le \text{logsumexp}(x_1,..,x_n) $ is obtained by setting all $x_i$'s in the argument of logsumexp, other than the (an) $x_i$ which equals $\text{max}(x_1,...x_n)$, equal to $-\infty$, resulting in $\text{log}(\text{exp}(\text{max}(x_1,...x_n))) = \text{max}(x_1,...x_n)$ — Mark L. Stone, May 26 '18 at 19:36

What is the use of the log of the sum of exponents in machine learning

1 Answers1