Training one model to work for many time series

Question

I have been working with time series data to try and make multi-step demand forecast for products. There are thousands of products and it is computationally very expensive and labour intensive to tune a single model for each product.

As far as I can see there are a couple of realistic options at my disposal:

Try and group 'similar' products together. Based on their time series they do not look correlated but perhaps there is some way to cluster time series data of varying lengths? I tried something using dynamic time warping, but when I had a manageable number of clusters (10-20) the series looked very disimilar. I don't know if there is a standard way to cluster time series data or whether there is some kind of guideline on when a cluster becomes to disimilar? If this works then manually tune a model for each cluster.
Train a model (maybe a neural network or LSTM) on all the different time series at the same time, with the hope that this model would then be capable of producing 'good' predictions per time series fed to it.

Is there some sort of methodology for training a model to make predictions on many (seemingly unrelated) time series data? Most of the literature I have read concerns itself with producing a model for one time series instead of a more general model. I understand when forecasting the assumption is made that the model is able to "mimic" the function that generated the existing data so it is very difficult to have a multipurpose model. But there must be some kind of resolution or general accepted way of working with many different time series data?

score 5 · Answer 1 · answered Oct 12 '18 at 17:25

Is there some sort of methodology for training a model to make predictions on many (seemingly unrelated) time series data?

The closest thing to an actual methodology for this is hierarchical forecasting.

On my team (I work in demand forecasting) we use a type of hierarchical forecasting to generate forecasts for product/location groups (for example for an entire class of products across an individual region). However we don't do any sort of clustering or scientific similarity analysis, instead we have a pre-defined product similarity matrix defined by the business (according to product type, supplier, etc...). The approach is similar in spirit to the paper that Dr. Kolassa mentioned, in the sense that the group level forecast provides the seasonality and the shape of the forecast - and then individual product histories are used simply to adjust the height of the signal.

Train a model (maybe a neural network or LSTM) on all the different time series at the same time, with the hope that this model would then be capable of producing 'good' predictions per time series fed to it.

On the other hand, the approach you describe in (2) is what Amazon uses with their DeepAR model. It is a gigantic LSTM that takes in all products at the same time, and then tries to learn the correlations between the different products to give one big model that is used for all the products. Although even with DeepAR, you still have to provide with product attribute features so that it correctly estimate product similarity.

score 2 · Accepted Answer · answered Oct 11 '18 at 08:37

2

Yes, there are ways of doing this. You could apply some kind of meta learning to adapt the learning process to each separate time series, or use transfer learning to transfer the knowledge learned from one series to another. I don't have pointers, since this is certainly not the first thing I would do, see below.

You could also try calculating seasonal indices to groups of products and deseasonalize them all together, then apply simpler non-seasonal models to the deseasonalized series. A simple paper on this is "The Application of Product-Group Seasonal Indexes to Individual Products" by Mohammadipour, Boylan & Syntetos, Foresight, 2012. A similar process should also work for other drivers, like trend, calendar events or promotions.

Alternatively, do consider fitting simple models to all your series, e.g., exponential smoothing. This will fit extremely quickly. Alternatively, invest a little time in some feature engineering and consider a very simple linear model - see Varmerdam's PyData presentation on the benefits of simple models; he even discusses time series models. If nothing else, the simpler models will serve as a useful benchmark. After you have invested one day in training the simple models and two weeks in meta and transfer learning the more complex ones, you may very well find that the simple models outperformed the more complex ones. (And that they are easier to interpret and communicate, and to maintain in production.)

answered Oct 11 '18 at 08:37

Stephan Kolassa

95,027
13
197
357

Thank you. It will take some time to work through these resources. I fully expect that simple models will on the whole probably outperfrom more complicated ones. I am trying to create an analysis of the applicability of ML to such forecasting problems. Whilst I can grid search parameters for simpler methods like ARIMA or exponential smoothing, I cannot use this process for a NN. Which is why I wanted to have a way of grouping time series together, so that I can make a good comparision between the effectiveness of the two different approaches. Thanks for the detail in your answer! – Aesir Oct 11 '18 at 08:44
Unfortunately I haven't been able to access the papers you linked. So if anybody does know where it can be found that would be helpful! – Aesir Nov 06 '18 at 08:54
1

@Aesir: I can get you the Mohammadipour paper if you'll just send me an email ([find my address here](https://forecasters.org/about/board-of-directors/)). Varmerdam's PyData presentation has unfortunately been removed, which is a pity. The rest should be accessible. – Stephan Kolassa Nov 06 '18 at 13:26
Luckily I watched the video while it was still up and it was very insightful! Thanks for that. – Aesir Nov 06 '18 at 13:51
Most likely you are busy and didn't have time yet but I sent you an email a few days ago, just checking it didn't go into junk or spam by accident! – Aesir Nov 09 '18 at 08:49

Training one model to work for many time series

2 Answers2