Best activation and loss function for regression problem where outputs are from 0 to 1

Question

I'm currently working on a regression problem, where the targets are from 0 to 1. Which would be the best pair of activation and loss function for these kinds of problems?

The ones that I have considered are:

Linear and L2 loss: L2 loss may lead to vanishing problems when the targets are small (like smaller than 0.1).
Sigmoid and L1 loss: Should I use sigmoid for a regression problem? I'm afraid it is only suitable when outputs are either 0 or 1.
Linear and L1 loss: L1 loss may not be able to deal with small difference between outputs and targets. I also heard that models with L1 loss are difficult to converge.

Is there any other activation and loss function that I may use? My experience is limited.

Tim · Answer 1 · 2019-06-14T13:16:10.950

Why should sigmoid activation function be suitable only for values that are either $0$ or $1$? Sigmoid function maps real numbers, to numbers in the unit interval. In fact, logistic function is the default link function is beta regression, i.e. the regression model for target values in unit interval. Sigmoid function is not the only choice, as you can use other functions like probit, or cloglog, or if you transform $y\times2 - 1$, you can even use tanh. If you are talking about neural networks, any function that maps the values to desired interval will suite, but usually sigmoid is a natural choice.

Using linear activation function (no activation) is a bad idea, because it would enable your model to predict values outside the desired interval. In vast majority of cases this would be undesirable.

As about loss, you can minimize any loss that is suitable for continuous values. Squared, or absolute loss are popular choices. Often people just stick to the squared error and it works fine enough for them, not to consider other losses. You can also use cross-enthropy, or Kullback–Leibler divergence, for predicting things like probabilities, those are the natural choices, and can have computational advantages. Finally, you can also maximize the likelihood (or minimize negative likelihood), where beta distribution is used as a likelihood function, as in beta regression.

Best activation and loss function for regression problem where outputs are from 0 to 1

1 Answers1

Related