I am trying to forecast the the number of orders for different products of a product group. I have the time series for each product. One of the problems is that some/most time series are intermittent and the nature of the time series is very spikey. Since ARIMA does not handle that type of time series well, I tried to implement other methods. I found that TSB yields somewhat usable results. Another method is a naive forecast using the mean of the previous week for the forecast of the next week.
Let's say I have one year of daily data. I forecast 7 days. For "training" I split the time series in about 40 different sequences where each is one week shorter. Since I have the historic data I can calculate the MSE and MAPE for each forecast. In the end I take the mean MSE and MAPE for each product over the 40 forecasts. I think this is a way of doing cross validating. But correct me if I am wrong. For example (now meaned) product 1 has arima_mape=30, mean_mape=40 and tsb_mape=25, product 2 arima_mape=50, mean_mape=20, tsb_mape=15
Then for setting the weights I also mean over the different products. Using the example I have. arima_mape=40, mean_mape=30 and tsb_mape=20. I would then assign the weights via the inverse divided by the inverse sum w_arima=(1/arima_mape)/(1/arima_mape+1/mean_mape + 1/tsb_mape)
I then get: w_arima=0.231 w_mean=0.308 w_tsb=0.461
and my final forecast now is F_final=w_arimaF_arima + w_meanF_mean + w_tsb*F_tsb
I am now applying that for every product of my product group for future forecasts
My questions are:
I know that there rarely is "right" way to do forecasting. But is my approach in general justifiable so far? I am doing something which is mathematically wrong?
Or should I rather do this on the product group dimension? Summing over all products, calculating the MAPE and then set the weights?
I also looked into some machine learning stuff, I initially wanted to use LSTM for time series forecasting with limited success so far. My question here is, can I somehow use ensemble learning to set the weights. In the end I just have three different predictors with some form of accuracy. Or is not applicable for time series forecasts?
Last but not least. Am I maybe trying too hard to set weights? I found a paper which discussed that the simple mean with F_final=1/3*F1+1/3+F2+1/3*F3 usually performs almost identically to complex combinations.
Again, any help is much appreciated. If someone has some good book suggestions about forecasting or the statistical foundations in general, I would gladly take them too. Right now, I am still more or less floating and just use stuff I can find without really knowing if I should.
EDIT I forgot to mention that in order to use the MAPE I sum over the 7 day forecast and compare it with the sum of the historic week . Am I doing something I should not do here? I am mostly interested in the amount for the next week not for each day. Should I maybe resample to week in the first place? I felt that I miss too much information. My data has weekly saisonality and I can use m=7 that way.