6

I need to forecast a univariate time-series of sales data with the following characterica.

  • It is a daily time-series
  • Around 70-80 % of the date nothing is sold ($x_t = 0$)
  • At the 20-30 % remaining days there is a positive integer numberof sales
  • The days during which nothing is sold are not always at the sameay day of the week

Until now I tried the croston-method (croston() from the forecast package in R).

Is the croston-method appropriate? Are there any suitable alternatives?

I am also grateful for code in R.

Edit:

My data looks similar to the data below:

0,0,1,0,0,0,0,2,0,0,0,0,0,0, 0,0,0,1,0,0,0,1,0,1,0,0,0,0,0,0,0,0,0,0,0,1,0,0,2,0,0,0

Ferdi
  • 4,882
  • 7
  • 42
  • 62
  • 1
    Please post your data and I will try to help you ... AUTOBOX has a number of special control files which might be useful . Please specify the starting date and country. – IrishStat Dec 05 '17 at 09:32
  • Thank you. Finally I had problems running AUTOBOX on my OS. I added the data. – Ferdi Dec 05 '17 at 15:46
  • Is that all you have 6 weeks and 42 values ? ... If you have a longer series please post it as 1 column – IrishStat Dec 05 '17 at 15:58
  • Thank you for your answer. I have 14 weeks, but the original data is confidential. – Ferdi Dec 05 '17 at 16:13
  • Why not try a zero-inflated Poisson model with autocorrelated errors? Maybe I am doing a gross over-simplification here but you can model the "hurdle mechanism" with a binomial and then then amplitude mechanism with a truncated Poisson/Negative Binomial. – usεr11852 Dec 05 '17 at 19:32
  • @usεr11852 Great idea, but what about the time series component of the data? – Ferdi Dec 05 '17 at 19:36
  • `sin(2*pi*t/T) + cos(2*pi*t/T) + ...`? For something "less crass" you can use something like a `gam` and define cyclic splines. That would ensure that your estimates will not explode outside the range of your existing data. – usεr11852 Dec 05 '17 at 19:41

2 Answers2

4

(This answer is based on experience with the business side of sales forecasting, more so than on rigorous statistical/mathematical knowledge)

Looking at your data, it makes more sense to forecast it at a weekly level than at a daily level. At at daily level it is too sparse, but at a weekly level you would have a more meaningful times series.

week 1: 0,0,1,0,0,0,0

week 2: 2,0,0,0,0,0,0

week 3: 0,0,0,1,0,0,0

week 4: 1,0,1,0,0,0,0

week 5: 0,0,0,0,0,0,0

week 6: 1,0,0,2,0,0,0

Any forecasting method you would use at a daily level, would give a fractional value per day. This doesn't really help, since these are sales units, so a forecast value of ~ 0.14 doesn't mean much, unless you interpret it as a probability (and I don't know enough math to help in that case, but others might know better how to treat that).

If you aggregate the data by week, you get:

week 1: 1

week 2: 2

week 3: 1

week 4: 2

week 5: 0

week 6: 3

You can then simply average that value over all the weeks you have, or maybe use a moving average. You would then get an average of 3 units sold per two weeks.

Keep in mind that this is a sales forecast: What is the purpose of a sales forecast? To make sure that you have enough inventory to satisfy customers' demand. Based on the method I described above, you would know that you need to ship/order 3 units of inventory every 2 weeks to satisfy the demand for that product - without going into ARIMA or Exponential smoothing or some other more involved time series analysis.

Skander H.
  • 10,602
  • 2
  • 33
  • 81
4

Croston's method is definitely an appropriate choice for this case. Its basic idea is to estimate non-zero demand and inter-demand interval separately. But note that its output is actually "demand rate", not actual demand units (e.g. a forecast of 0.1 means a demand of 1 unit over 10 periods). The exact timing of the demand is actually not provided.

tsintermittent package provides some alternatives for intermittent time series forecasting, including iMAPA and Teunter-Syntetos-Babai method. This package also lets you use some adjustments to deal with the bias of Croston's method, like Syntetos-Boylan approximation.

Fan Wang
  • 111
  • 6