All the discussions online seem to be centered around the benefits of ReLU activations over SoftPlus. The general consensus seems to be that the use of SoftPlus is discouraged since the computation of gradients is less efficient than it is for ReLU.
However, I have not found any discussions on the benefits of SoftPlus over ReLU. Only that SoftPlus is more differentiable, particularly around x = 0.
I am using a novel loss function which contains gradients. Therefore, would SoftPlus be a better option than ReLU for such a use case?