2

The LogSumExp trick is often introduced in the context of Naive Bayes, where the computation of the class posteriors would lead to underflow.

Specifically, we need to compute $$ p(y=c|\mathbf{x}) = \frac{p(\mathbf{x}|y=c)p(y=c)}{p(\mathbf{x})} $$ Since $p(\mathbf{x})=\sum_i p(\mathbf{x}|y=i)p(y=i) = \sum_i p(y=i)\prod_j p(x_j|y=i)$, taking logs does not solve the problem. Hence the necessity of the LogSumExp trick.

But since $p(\mathbf{x})$ is constant for all classes, in practice we don't really need to compute it in order to figure out the most likely class. Numerical concerns are all about practice, so I don't see why this is used as an example.

We would need it if we actually wanted to know the probabilities, for instance in order to know the confidence of the prediction, but it looks like mere classification does not need this trick.

Am I missing something?

cangrejo
  • 2,121
  • 13
  • 22

0 Answers0