13

Waiting the next course of Andrew Ng on Coursera, I'm trying to program on Python a classifier with the softmax function on the last layer to have the different probabilities. However, when I try to use it on the CIFAR-10 dataset (input : (3072, 10000)), I encounter an overflow when it computes the exponentials. Indeed, I have numbers like 5000, 10000 or 25000.

I have already tried two things :

  • substract a constant to the matrix before compute the exponential but I have big difference between the numbers so, I can't do that.
  • compute the exponential of the log of the matrix but it still overflowing.

Can somebody help me to avoid this problem ?

Thank you

EDIT : https://github.com/Kentena/softmax/

Taylor
  • 18,278
  • 2
  • 31
  • 66
Dlmss
  • 143
  • 1
  • 6
  • Thousands have played aournd the CIFAR10 dataset and NNs. You may have messed up in your code. Can you show it? – tagoma Sep 24 '17 at 20:42
  • Yes I have updated my post and posted a github repository. However, my code is not commented :/ – Dlmss Sep 24 '17 at 21:21

1 Answers1

24

Observe that $$ \frac{e^{x_i}}{\sum_j e^{x_j}} = \frac{e^{-m}}{e^{-m}}\frac{e^{x_i}}{\sum_j e^{x_j}}= \frac{e^{x_i-m}}{\sum_j e^{x_j-m}} $$ for any constant $m$.

Obviously it is not true that $e^{x_i} = e^{x_i-m}$, but the normalized versions are the same. Your problem is that the $x_i$s are too big, so subtract the same number $m$ from all of them before you take the softmax. Sometimes people set $m$ to be the maximum of all the $x_i$s.

Taylor
  • 18,278
  • 2
  • 31
  • 66