2

I used several datasets and make predictions on it with many algos (ARIMA, Theta, Smoothing, etc.). Until now the current outome as well as the predictions (of the datasets) were strictly positive (always greater than 0). To evaluate the quality of the forecast between different models I used the sMAPE and also RMSE.

However, I have a new dataset that contains both positive and negative values. To be more specific, these are returns of a company (positive if the company wins, negative if the company makes a loss).

Therefore, is sMAPE suitable for this type of dataset or should I use another measure such as the Root Mean Squared Error (RMSE)?

I ask this question because the sMAPEs I get for this new dataset, unlike the other datasets, gives very large values typically between 120 and 160 while the datasets with positive values are between 1 and 12. However the difference between the RMSE of the positives values datasets and the new dataset is not that huge.

S12000
  • 528
  • 1
  • 4
  • 14

1 Answers1

2

The sMAPE is a percentage error, which expresses the absolute error as a percentage of the average of the forecast and the actual. Percentage errors appear easy to understand and interpret.

However, I would be a bit skeptical about interpreting percentage errors of values that can take positive and negative values. You could even get undefined values, e.g., if you forecast 10 and the actual is -10, and you would try to divide the absolute error of 20 by the average of the actual and the forecast, which is zero. After all, the sMAPE was originally thought up to mitigate the problem of the "ordinary" MAPE with zero actuals, where we would have this exact problem of dividing by zero.

I would not use the sMAPE, or any percentage error, with values that can take negative and positive values.

Shameless piece of self-promotion: I would also suggest you read What are the shortcomings of the Mean Absolute Percentage Error (MAPE)? and think about what functional of the unknown future density you want to elicit using the sMAPE.

Stephan Kolassa
  • 95,027
  • 13
  • 197
  • 357
  • Hello Stephan thanks for your answer. I understand that you suggest not to use the sMAPE for the case I mention. The reasons are clearly mentionned. However, it would be great if you can give me a recommendation about which other metric to use intead. [Please forgive me if the answer may be in the link you provided me , I have not read your other article/post yet. I will do it asap]. – S12000 Aug 13 '19 at 16:13
  • I personally am a big fan of the MSE, which is [the only metric that will be minimized in expectation by an unbiased prediction](https://stats.stackexchange.com/a/210857/1352). (I'll assume you want an unbiased prediction. If not, e.g., if you want a quantile forecast, then use an appropriate hinge loss.) You can take the root, or scale the RMSE by the in-sample mean to get a percentage (which may again blow up if you mix positive and negative actuals - again an effect of percentages not being very helpful for mixed actuals). – Stephan Kolassa Aug 13 '19 at 16:48
  • Hello Stephan, yes I want an unbiased prediction. Thanks a lot for your advise. – S12000 Aug 13 '19 at 17:50