I am implementing multilayer perceptrons with the softmax activation function over Theano. In some extreme cases I am running into problems with too high/low values in the softmax function that originate some distributions that are in some places equal to zero.
When computing the logarithm of these I get -inf and the error propagates through all the code.
My simple solution was adding a small constant to the distribution like this:
self.p_y_given_x = T.nnet.softmax(T.dot(input, self.W) + self.b) + 0.0000001
I already googled and found plenty of solutions that were more elegant than mine (and exact), but the nature of Theano demands something different since the log-likelihood will be symbollicaly differentiated to find the gradients for the algorithm.
Also, I found weird that this problem is not commonly adressed for neural networks, logistic regression or whatsoever. Are these kind of values so extreme that it actually indicates problems in another part of my system? Am I doing something wrong here or missing some point?
Update 1: Theano can give you some very different results depending on what mode
tag you're using. Here I think I was using mode = FAST_COMPILE
and apparently it deactivated the numerical optimizations and stabilizations for the function graphs done by the compiler. If you're doing this try changing it to mode = FAST_RUN
Update 2: This page lists some optimizations made by Theano including a specific one for softmax: local_log_softmax