5

I'm doing some simulations and I would like to estimate a real number that is uniformly distributed between minValue and maxValue. For instance, between 20 and 30 (it's not an angle, so estimating its sine isn't appropiate). So far, I have used the MSE loss, but after plotting the histogram of the estimated samples, they follow a Gaussian distribution.

After some research on the Internet, I saw that using the L2 norm assumes that the target is normally distributed (unrelated question: what is the mathematical reason for that?). However, the target follows an uniform distribution.

Therefore, what could be a good loss function to improve the distribution of the estimations? Could it be solved by using a bigger network? My network is composed by 9 convLayers imitating the ResNet architecture and a fully-connected layer to estimate the target.

Finally, since the target data is being simulated, I have access to infinite data.

kjetil b halvorsen
  • 63,378
  • 26
  • 142
  • 467
Josemi
  • 153
  • 2
  • 2
    A fundamental problem expressed in this post is the confusion between (1) the (uniform) distribution of the response variable and (2) its *conditional* distribution. You make an assumption about (1), but regression is concerned about (2) only. – whuber Dec 04 '21 at 14:43
  • If all you’re doing is simulating a uniform variable that is unrelated to the predictors, something like **y – Dave Dec 04 '21 at 14:57
  • The predictors are related to the uniform random variable by some matrix multiplications, so there is a mapping y – Josemi Dec 06 '21 at 10:35

1 Answers1

6

using the L2 norm assumes that the target is normally distributed

Sorry, but this is nonsense. (There is a lot of nonsense on the internet.)

Your choice of error measure or loss function assumes nothing about the (conditional or unconditional) distribution of the target variable. Rather, different loss functions elicit different functionals of the target variable. The MSE will be minimized in expectation by the conditional mean, whether the conditional distribution is normal or Poisson. (Assuming this expectation exists, and we are not dealing with a Cauchy.) The MAE will be minimized in expectation by the median. If your distribution is indeed symmetric, like the normal, both MSE and MAE will tend towards the same point prediction, but if the distribution is asymmetric, like the Poisson, the two minimizers will be different.

You may find a paper of mine (Kolassa, 2020, IJF) useful. Or this thread: What are the shortcomings of the Mean Absolute Percentage Error (MAPE)?

Thus, your strategy should be to first decide which functional of your target distribution you are looking for - the mean, the median, a quantile, whatever. Then, and only then, can you choose an error measure that elicits this functional.

Stephan Kolassa
  • 95,027
  • 13
  • 197
  • 357
  • 1
    +1 though I think your final paragraph may be backwards: I would have thought your loss function should determine your estimator rather than the other way round – Henry Dec 04 '21 at 12:11
  • @Henry: your loss function indeed determines which functional is optimal *for that loss function*. If your bonus is tied to a low MAPE, you output a different prediction than if your bonus is tied to a low MSE. But you will *use* your prediction for some subsequent process! And presumably there is some functional that is better suited to your use than another one. For instance, for inventory control, you need a high quantile prediction. So you should use a quantile loss, not MAPE, which will give you a prediction that is useless in setting safety amounts. – Stephan Kolassa Dec 04 '21 at 12:16
  • 1
    But in that case, it is the loss or regret for taking the wrong decision in the subsequent process that you want to apply, and taking that back determines the loss function in the original process and thus the functional. If your bonus is giving you the wrong incentives, your managers need to think about why they are doing this – Henry Dec 04 '21 at 12:35
  • @Henry: exactly. It's just that I *very* frequently see exactly this mismatch. My forecasts are evaluated on MAPE, but then a *different* forecast is used for the inventory decisions. Of course, you could evaluate "the whole package" using inventory metrics - but there are so may other influences on inventory beyond the forecasts (logistical rounding, delivery schedules, ...) that it's hard to say whether the forecast improved. So we still need forecast/prediction accuracy metrics, it's just unintuitive how to pick ones that conform with the subsequent process. – Stephan Kolassa Dec 04 '21 at 13:45