1

I have a fallowing problems: I'm training a neural network against some set of output values (regression problem). Those values are between -inf to inf and I can't normalize them, because they come continuously from a stream of data. Now I'm using MSE, but sometimes it results in very high losses and disturbs training. To avoid it I wanted to somehow limit the value of loss. Thus I came up with the fallowing loss:

$$ L(true, false) = \frac{1}{N}\sum_{i=1}^N{log((true - false)^2+1)}$$

The questions is, if there's a loss like this in the literature, because I can't find anything like that. If not, would it have properties like square loss or something different ?

Daniel Wiczew
  • 527
  • 2
  • 10
  • Mean squared logarithmic error? Some info [sklearn](https://scikit-learn.org/stable/modules/model_evaluation.html#mean-squared-log-error) – user2974951 Dec 17 '21 at 10:45
  • @user2974951 MSLE works only if compared values have values different than -1, if the values reach -1 then the loss function explodes – Daniel Wiczew Dec 17 '21 at 10:46
  • If the issue is extreme values then maybe try some robust metrics, such as MAE or MdAE. – user2974951 Dec 17 '21 at 10:53
  • @user2974951 MAE has the same issue here, giving losses about ~10^6 between true value and prediction. MdAE - can you expand the acronym ? – Daniel Wiczew Dec 17 '21 at 10:55
  • M = mean, Md = median. If your loss truly varies that much from one episode to the next, then maybe your model is not well specified, very prone to overfitting? – user2974951 Dec 17 '21 at 11:00
  • Also I completely missed that your values are unnormalized, because you are training the model online. This is not an issue, you can always use whatever data you have at any given moment to estimate some metric (such as min/max) and use these to scale the new data. – user2974951 Dec 17 '21 at 11:05
  • @user2974951 The problem is that incoming data are highly variable with respect to the input and output during online learning, thus the loss often explodes when there's a sample far different than all other before. – Daniel Wiczew Dec 17 '21 at 11:07
  • @user2974951 > given moment to estimate some metric (such as min/max) and use these to scale the new data Although the additional estimate adds more noise to the model than it helps, it's like with batch normalization in online learning - it does more harm than help – Daniel Wiczew Dec 17 '21 at 11:11

1 Answers1

1

I am not aware of a loss function like this, but I'm looking forward to other answers giving pointers to literature.

Your other question is about the properties of your loss function. The inner part is a squared loss, which is minimized in expectation by the conditional mean of your observations. You then apply a monotonic transformation $x\mapsto \log(x+1)$ to this inner expression, so the minimizer stays the same. Overall, your loss function incentivizes your model to yield the conditional mean (unlike, say, the mean absolute error, which draws your predictions towards the conditional median). If this is the functional you want to elicit, go for it.

You may find a paper of mine (Kolassa, 2020, IJF) helpful.

Stephan Kolassa
  • 95,027
  • 13
  • 197
  • 357