I need to predict something using a neural network. The output values are bound to be non-negative, but there's not really an upper bound. I do know that the output is never going to be higher than a certain level in practice. Also, my expected output can should span all numbers between $0$ and the maximum.
So, which output activation function should I use? Sigmoid seems wrong, as the gradient would give too much importance to high value near the maximum. Unless I scaled my data so that the maximum value I ever encounter is around 0.6, so that this output behaves like a sigmoid near 0 and linearly in the rest of the image. Linear doesn't seem right as it allows negative outputs. ReLU by definition gives me an output in the correct range... but it's not really well behaved.
Any suggestion?