Combination of hierarchial time series forecasts with different methods - setting weights

Question

I am trying to forecast the the number of orders for different products of a product group. I have the time series for each product. One of the problems is that some/most time series are intermittent and the nature of the time series is very spikey. Since ARIMA does not handle that type of time series well, I tried to implement other methods. I found that TSB yields somewhat usable results. Another method is a naive forecast using the mean of the previous week for the forecast of the next week.

Let's say I have one year of daily data. I forecast 7 days. For "training" I split the time series in about 40 different sequences where each is one week shorter. Since I have the historic data I can calculate the MSE and MAPE for each forecast. In the end I take the mean MSE and MAPE for each product over the 40 forecasts. I think this is a way of doing cross validating. But correct me if I am wrong. For example (now meaned) product 1 has arima_mape=30, mean_mape=40 and tsb_mape=25, product 2 arima_mape=50, mean_mape=20, tsb_mape=15

Then for setting the weights I also mean over the different products. Using the example I have. arima_mape=40, mean_mape=30 and tsb_mape=20. I would then assign the weights via the inverse divided by the inverse sum w_arima=(1/arima_mape)/(1/arima_mape+1/mean_mape + 1/tsb_mape)

I then get: w_arima=0.231 w_mean=0.308 w_tsb=0.461

and my final forecast now is F_final=w_arimaF_arima + w_meanF_mean + w_tsb*F_tsb

I am now applying that for every product of my product group for future forecasts

My questions are:

I know that there rarely is "right" way to do forecasting. But is my approach in general justifiable so far? I am doing something which is mathematically wrong?
Or should I rather do this on the product group dimension? Summing over all products, calculating the MAPE and then set the weights?
I also looked into some machine learning stuff, I initially wanted to use LSTM for time series forecasting with limited success so far. My question here is, can I somehow use ensemble learning to set the weights. In the end I just have three different predictors with some form of accuracy. Or is not applicable for time series forecasts?
Last but not least. Am I maybe trying too hard to set weights? I found a paper which discussed that the simple mean with F_final=1/3*F1+1/3+F2+1/3*F3 usually performs almost identically to complex combinations.

Again, any help is much appreciated. If someone has some good book suggestions about forecasting or the statistical foundations in general, I would gladly take them too. Right now, I am still more or less floating and just use stuff I can find without really knowing if I should.

EDIT I forgot to mention that in order to use the MAPE I sum over the 7 day forecast and compare it with the sum of the historic week . Am I doing something I should not do here? I am mostly interested in the amount for the next week not for each day. Should I maybe resample to week in the first place? I felt that I miss too much information. My data has weekly saisonality and I can use m=7 that way.

score 1 · Accepted Answer · edited Jun 11 '20 at 14:32

There are various ways of optimizing combination weights, but using the inverse of out-of-sample forecast errors is an intuitive way to go about it. (You are using out-of-sample error, not in-sample ones, right?)

One question is of course how you calculate MAPEs for intermittent data without dividing by zero. I assume you do some kind of aggregation.
Just try it! You know your data better than we do. If your series behave very similarly, then pooling them makes sense. Without knowing your data, there is little we here can say.
LSTMs typically need lots more data than a demand forecaster has available. One year of daily data is very little for an LSTM to shine. Or other neural network architectures, for that matter. Then again, in a combination approach, adding ML methods to the mix should usually not break anything, so if you can implement a simple ML method with low effort (e.g., by just running nnetar() from the forecast package in R), then go for it.
The fact that combinations with constant equal weights often outperform combinations with "optimally set" weights has been called the "forecast combination puzzle". One explanation is that "optimally estimating" weights adds variance to the entire procedure, which may outweigh the lower bias in the bias-variance tradeoff (Claeskens et al., 2016, IJF). I'd recommend you test your weighting scheme against a very simple scheme with equal weights. It may indeed turn out that the simple scheme performs better.

Since you mention the mape above, which kind of raises a red flag to me, you may be interested in Why use a certain measure of forecast error (e.g. MAD) as opposed to another (e.g. MSE)? and What are the shortcomings of the Mean Absolute Percentage Error (MAPE)?

In terms of literature, I recommend the excellent free online book Forecasting: Principles and Practice (2nd ed.) by Athanasopoulos & Hyndman, though this is not focused on intermittent demand forecasting. I know that John Boylan and Aris Syntetos are writing a textbook on intermittent demand forecasting, but this does not seem to have been published yet. If you are explicitly interested in model combination, then combination across frequencies might be enlightening (Petropoulos & Kourentzes, 2015, EJOR). You could also look through previous questions in the intermittent-time-series tag.

Yes, I am using out of sample forecast. Say I have the whole year of data. Then my first forecast is using the data up 24.12, the next forecast is up to 17.12.I am aware of the caveats using MAPE. I found sMAPE but wasn't able to implement it yet. Another problem is that my ts is sometimes disconnected too. Say the product was not listed in store. Just setting 0 would not be right I think. When a break in data happens I disregard that segment. I would say I am in a bit over my head :-). I have seen the online book mentioned a lot. I will work through it, it looks like a good foundation. — Folanir, Oct 26 '18 at 13:33

Combination of hierarchial time series forecasts with different methods - setting weights

1 Answers1

Linked