Strategies for time series forecasting for 2000 different products?

Question

First of all, I realise that my question is very broad and that it may be hard to answer this question because of it.

Do you have any advice on how to approach a 'problem' where you need to make forecasts/predictions for 2000+ different products? In other words, each product requires a different forecast/prediction. I have 2 years of historical data on week level (i.e. demand per week per product).

I need to do this in a short time period: I have about a week to do this, hence I am looking for ways that I can quickly make relatively good prediction models. Creating a model for each product and inspecting its performance closely, one by one, would be too time-consuming.

I thought of segmenting the products based on the variance, so that I can employ simple models for products that have a low variance. While this is probably not ideal, it would be a quick way to narrow down the number of models I need to create.

It would be greatly appreciated if you have any practical advice for me on approaching this problem.

Are those similar products? You might benefit from searching this sitr for hierarchical forecasting — kjetil b halvorsen, Jan 26 '19 at 17:32
Check the [Elo ratings](https://microprediction.github.io/timeseries-elo-ratings/html_leaderboards/univariate-k_003.html) for time-series prediction methods, then decide if you want to go with Prophet, NeuralProphet, TSA or other methods. — Peter Cotton, Apr 16 '21 at 19:15

Skander H. · Accepted Answer · 2019-02-20T21:33:47.643

14

A follow up to @StephanKolassa 's answer:

I concur with Stephan that ETS() from the forecast package in R is probably your best and fastest choice. If ETS doesn't give good results, you might want also want to use Facebook's Prophet package (Auto.arima is easy to use, but two years of weekly data is bordering not enough data for an ARIMA model in my experience). Personally I have found Prophet to be easier to use when you have promotions and holiday event data available, otherwise ETS() might work better. Your real challenge is more of a coding challenge of how to efficiently iterate your forecasting algorithm over a large number of time series. You can check this response for more details on how to automate forecast generation.
In demand forecasting, some form of hierarchical forecasting is frequently performed, i.e you have 2000 products and you need a separate forecast for each separate product, but there are similarities between products that might help with the forecasting. You want to find some way of grouping the product together along a product hierarchy and then use hierarchical forecasting to improve accuracy. Since you are looking for forecasts at the individual product level, look at trying the top-down hierarchical approach.
Something a little bit more farfetched, but I would like call it out: Amazon and Uber use neural networks for this type of problem, where instead of having a separate forecast for each product/time series, they use one gigantic recurrent neural network to forecast all the time series in bulk. Note that they still end up with individual forecasts for each product (in Uber's case it is traffic/demand per city as opposed to products), they are just using a large model (an LSTM deep learning model) to do it all at once. The idea is similar in spirit to hierarchical forecasting in the sense that the neural network learns from the similarities between the histories of different products to come up with better forecasts. The Uber team has made some of their code available (through the M4 competition Github repositories), however it is C++ code (not exactly the favorite language of the stats crowd). Amazon's approach is not open source and you have to use their paid Amazon Forecast service to do the forecasts.

With regards to your second comment: You need to differentiate between forecasting sales and forecasting demand. Demand is unconstrained, if suddenly an item is popular and your customers want 200 units, it doesn't matter that you have only 50 units on hand, your demand is still going to be 200 units.

In practice it is very difficult to observe demand directly, so we use sales as proxy for demand. This has a problem because it doesn't account for situations where a customer wanted to purchase a product but it was unavailable. To address it, along with the historical sales data, information about inventory levels and stock outs is either directly included in a model or used to preprocess the time series prior to generating a model for forecasting.

Typically an unconstrained forecast is generated first by a forecast engine and then passed on to a planning system which then adds the constrains you mention (i.e demand is 500 units but only 300 units are available) along with other constraints (safety stock, presentation stock, budgetary constraints, plans for promotions or introductions of new products etc...) - however this falls under the general rubric of planning and inventory management, not forecasting per se.

edited Feb 20 '19 at 21:33

answered Jan 26 '19 at 21:38

Skander H.

10,602
2
33
81

In reply to your second point, I have product families / categories for each of the products. If I understand your suggestion correctly, I would make a forecast on product family level and then disaggregrate to product level, correct? – Amonet Jan 26 '19 at 22:16
In addition to the above, would it be possible to take into account constraints? For example: if there is an inventory of 300 products, the demand (read: sales) cannot exceed 300 products. Or is this perhaps a shortcoming of hierarchical forecasting (top down), as in my case this constraint information would be at an even lower level than the product level (i.e. The inventory level of a location) – Amonet Jan 26 '19 at 22:53
1

@Amonet "I would make a forecast on product family level and then disaggregrate to product level, correct?" Yes. – Skander H. Jan 26 '19 at 23:13
@Amonet Regarding your second point: See edit added to answer. – Skander H. Jan 26 '19 at 23:36
4

+1, all extremely good points. Regarding hierarchical forecasting, I am a big fan of [optimal reconciliation](https://otexts.com/fpp2/reconciliation.html), which I and others have repeatedly found to outperform top-down and bottom-up on *all* levels of the hierarchy. Plus, it's at heart an optimization algorithm, so one can take constraints into account. (For instance, if some series have low volume, the unconstrained reconciliation can lead to negative forecasts.) I agree, though, that one should aim at uncensored demand forecasts... – Stephan Kolassa Jan 27 '19 at 05:48
... Also, Amazon have published some research on how they forecast, e.g., [here](https://arxiv.org/abs/1704.04110) and [here](https://www.researchgate.net/publication/319594287_Probabilistic_demand_forecasting_at_scale) and [here](https://papers.nips.cc/paper/6313-bayesian-intermittent-demand-forecasting-for-large-inventories). It's not for the faint at heart, though, and... – Stephan Kolassa Jan 27 '19 at 05:53
4

... I would always recommend [to start with simple forecasting methods first](https://stats.stackexchange.com/a/124956/1352), which can be surprisingly hard to beat. [See also here.](https://www.sciencedirect.com/journal/journal-of-business-research/vol/68/issue/8) – Stephan Kolassa Jan 27 '19 at 05:53
Thank you to you both for providing such helpful information! I would like to ask one additional question. The products I need to make forecasts for, are sold at different locations. For some locations I have 2 years of weekly data, for others only 6 months where the 6 months fully overlap with the 2 years data (i.e.: it at 18 months in, another location starts selling the product). My question is, how could I integrate this in my model, so that it understands that a potential peak in sales is probably due to another location selling the product? Can this also be done with the simpler models? – Amonet Jan 28 '19 at 13:03
I am a bit surprised that two years of weekly data were not enough for an ARIMA... Could you please suggest what forecast horizon you were considering? In addition:,@StephanKolassa: While, I agree that reconciliation of forecast can be often beneficial, the dynamics have to be mostly stable. For example, if new products come in a market that can really mess up the `hts` proportion estimates as the proportions are solved using the historical data. (Which is actually the use case that the OP describes) So I would argue that one needs to be careful about readily reconciling. – usεr11852 Jan 28 '19 at 23:48
3

@usεr11852: two years are just two cycles. In seasonal differencing, we lose one cycle. So seasonal ARIMA loses half its data just through the differencing. I would not use seasonal ARIMA with less than five cycles' worth of data. ... – Stephan Kolassa Jan 29 '19 at 14:44
@usεr11852: ... I'm a bit unclear on what you mean by "the proportions are solved using the historical data". Yes, `hts` offers multiple ways of reconciling hierarchies, and top-down in particular requires a forecast for proportions (typically, a naive forecast is used, by just carrying past proportions forward). But the "optimal combination" approach, which is what I recommend, does not use proportions - only information on which forecasts need to be sum consistent. Are you maybe referring only to top-down? – Stephan Kolassa Jan 29 '19 at 14:46
@StephanKolassa: Ooff... You are right about the ARIMA; myopically I was not considering SARIMA models. Apologies. Regarding Optimal Reconciliation: I was using `method = "comb"`; from what I remember, bottom series of smaller that had recently experienced a transition where "fine" in the individual forecasts but when employing reconciliation they were brought up (or down) their historical averages. I strongly suspect that happened was that the higher-up series where generally stable, bigger magnitude bottom series where OK, and lower volume series had experienced a lot of distortion (cont.) – usεr11852 Jan 29 '19 at 19:11
because the "re-distribution" of RMSE messed them up "equally" as the other "regular volume" bottom series. I will try to find those analysis in the next few days and potentially comment in discussion room. (And for the record I had upvoted both posts when I did my comment.) – usεr11852 Jan 29 '19 at 19:15
@Amonet "For some locations I have 2 years of weekly data, for others only 6 months where the 6 months fully overlap with the 2 years data" - This is one of the situations where hierarchical forecasting helps - although now you need to think of both a product hierarchy and location hierarchy (to group similar stores). – Skander H. Jan 30 '19 at 08:36
1

@usεr11852: this reminds me of some analyses I did where the bottom series were adjusted "too much", relatively speaking, because adjustments are more-or-less balanced in absolute terms, not in percentage terms. I then used `mgcv::pcls()` for the reconciliation, feeding the summation matrix in by hand. This had two advantages: (1) it allows you to set box constraints, e.g., to ensure reconciliated forecasts are non-negative, (2) it allows you to weight the adjustments, so I just used the inverse of each series' historical average as a weight, which addressed the adjustment problem. – Stephan Kolassa Jan 30 '19 at 16:53
@StephanKolassa Thank you; I will look this up in the future. – usεr11852 Jan 30 '19 at 22:56

score 12 · Answer 2 · answered Jan 26 '19 at 16:27

12

We will only be able to give you very general advice.

Are there any strong drivers, like promotions or calendar events, or seasonality, trends or lifecycles? If so, include them in your models. For instance, you could regress sales on promotions, then potentially model residuals (using exponential smoothing or ARIMA).
There are software packages that do a reasonably good job at fitting multiple time series models to a series. You can then simply iterate over your 2000 series, which should not take much more runtime than a cup of coffee. I particularly recommend the ets() function in the forecast package in R. (Less so the auto.arima() function for weekly data.
At least skim a forecasting textbook, e.g., this one. It uses the forecast package I recommend above.
What is your final objective? Do you want an unbiased forecast? Then assess point forecasts using the MSE. Will your bonus depend on the MAPE? Then this list of the problems of the MAPE may be helpful. Do you need forecasts to set safety amounts? Then you need quantile forecasts, not mean predictions. (The functions in the forecast package can give you those.)

If you have more specific questions, do post them at CV.

answered Jan 26 '19 at 16:27

Stephan Kolassa

95,027
13
197
357

I'm sorry for this little off-topic question, but what material on forecasting (in general) would you recommend after studying 'Forecting: Principles and Practice' you mention? :-) – Łukasz Grad Jan 26 '19 at 16:37
5

@ŁukaszGrad: if you have worked your way through FPP2, [our book](http://www.businessexpertpress.com/books/demand-forecasting-managers/) won't tell you much new. Ord et al.'s *Principles of Business Forecasting* (2nd ed.) goes into more depth ([I reviewed it here](https://doi.org/10.1016/j.ijforecast.2017.10.003) if you have access). ... – Stephan Kolassa Jan 26 '19 at 16:47
4

... You might profit from looking at [the IIF](https://forecasters.org/), maybe read its publication [*Foresight*](https://foresight.forecasters.org/) or attend one of its conferences, either the [ISF](https://isf.forecasters.org/), which will take place this year in June in Thessaloniki, or the Foresight Practitioner Conference, this year in November at the SAS campus in Cary, NC, depending on where you are. The ISF is somewhat more academically oriented, but recently, I'd say about 33% of attendees came from industry, and there usually is a practitioner track. – Stephan Kolassa Jan 26 '19 at 16:49
3

(Full disclosure: I am involved with all of these, so take my recommendations with a large grain of salt. If you do attend one of the conferences, find me and say hi!) – Stephan Kolassa Jan 26 '19 at 16:50
Thank you for all this detailed information! I must say I discovered FPP2 through one of your other posts here on CV and I think it is a great book. Forecasting is something I neglected in the past :-) – Łukasz Grad Jan 26 '19 at 17:01
@StephanKolassa how do you get quantile forecasts using the forecast package? – Skander H. Jan 26 '19 at 21:04
1

@SkanderH: use the `forecast()` command on your fitted model (i.e., the output of `ets()` or `auto.arima()`), and specify the `level` parameter. See `?forecast.ets` and `?forecast.Arima` (note the capitalization). – Stephan Kolassa Jan 26 '19 at 21:18
1

@StephanKolassa I accepted the other answer, as it's a follow-up on your answer and people are therefore more inclined to read your helpful advice also. – Amonet Jan 30 '19 at 07:44
There are many great and useful answers for this question here. I would like to add that you can try using `fable` , `fable.prophet` `modeltime `packages in `R` and `scikit-hts` library in `Python`. Fable group of packages also have additional functionality of reconciliation, cross validation etc. Also `Sugrrants` and `Feasts` packages might be useful tools while you are working with the above R packages. There is a whole ecosystem in R called `tidyverts' for timeseries. – cube Oct 04 '21 at 10:46

score 1 · Answer 3 · answered Jan 26 '19 at 16:23

Segmenting based on the variance of the original series makes no sense to me as the best model should be invariant to scale. Consider a series ..model it and then multiply each value in the time series by 1000 .

In terms of mass producing equations that may have both deterministic structure (pulses/level shift/local time trends ) OR either auto-regressive seasonality and arima structure you have to run a computer-based script . Beware of simple auto arima solutions that assume no deterministic structure OR fixed assumptions about same.

Strategies for time series forecasting for 2000 different products?

3 Answers3

Linked