2

I'm working on a regression model where I have to predict time. These times go from a few seconds to up to 30 min and more.

I calculated the sMAPE through 1 minute bins of the target, and noticed that:

  • Target 0-1 minutes: up to 200% sMAPE (unstable)
  • Target 1-2 minutes: ~50% sMAPE
  • Target 2-20 minutes: ~25% sMAPE
  • Target >20 minutes: ~35% sMAPE

Most of my data is in the 0-2 min bins and very little data is >20 min.

I figured that the sMAPE might be big for small targets because very little errors correspond to high percentages. For >20 minutes I imagined the errors were due to lack of training data.

My next approach was to collect more data from 0-2 minutes and/or for more than 20 minutes and see the results.

The extra data helped the errors in these bins, but degraded substantially them in the other bins. For example, I was able to get 0-1 minutes sMAPE to down to 50% or increase the range of values with ~25% sMAPE, depending on the type of oversampling. But in any of these cases the results for the other bins were degraded.

I have some intuition, but I'm not very confident. I believe that when I add the new data the model tries to optimize better to that range of values, degrading the other range.

I thought about maybe creating 3/4 different models to work on the 3 different ranges. Maybe first a general model that would find the correct range to search (0-2 minutes / 2-20 minutes / >20 minutes) and then assign the correct model to do the prediction. Or an ensemble of 3 models trained in 3 different datasets. I don't know if this makes sense and I feel like it should not be necessary.

For now I've been working with LightGBM and Mean Absolute Error objective.

jcp
  • 469
  • 4
  • 13

1 Answers1

2

A model trained to minimize the MAE will not lead to the minimal sMAPE. These are two different functionals of the underlying unknown future density (or densities).

I recommend What are the shortcomings of the Mean Absolute Percentage Error (MAPE)?, in particular the longer explanation in the last bullet point of my answer, and relatedly Why use a certain measure of forecast error (e.g. MAD) as opposed to another (e.g. MSE)?


I suggest that you think long and hard about whether minimizing sMAPE is truly what you should be doing. Does sMAPE really reflect your loss function? What decisions will you take based on a point forecast? The decision should inform your loss function. Alternatively, you may want to decouple the decision from the modeling part, which you can do by predicting a full predictive density, which contains all the information you need for a subsequent decision and which can be evaluated on its own, using proper scoring rules.

Stephan Kolassa
  • 95,027
  • 13
  • 197
  • 357
  • Thanks, lots of good stuff in your posts. I would just like to add that even the MAE increases substantially depending on the kind of additional sampling that I add (I keep the validation set unchanged). Would you recommend methods to calculate a predictive density instead of point estimates? – jcp Apr 16 '19 at 14:03
  • 1
    It sounds like you are selectively sampling observations with high or low values. Since this is not a random or representative sample, it's not surprising that your MAE will increase. If you purposely select samples that your model will likely struggle with (because they are atypically large or small), then of course your model will struggle with them, leading to larger errors. Yes, I would always recommend trying to get at predictive densities. Unfortunately, I am not familiar with LightGBM, so I can't help you these. Good luck! – Stephan Kolassa Apr 16 '19 at 14:10