How to forecast integer time series in R?

Question

For a while now I used to forecast integer/count time series as I would do for any other continuous time series, meaning : I use models like ARIMA, ETS, THETA, TBATS ... And then I simply round the results. So I wonder is there some models designed specifically for count time series ? Are they more efficient than the previous models ?

Recently, Long Short Term Memory (LSTM) machines, derivatives of recurrent neural networks, have become popular. They are made to have a "long memory" for tasks such as generating intelligent responses to questions about a book. They translate to time series fairly well. — ERT, Aug 17 '18 at 15:23
@ERT that does not answer OP’s question. The question pertains to integer specific statistical models. LTSM can be used for time series, but does not address the issue of using a model for integers vs continuous data — Jon, Aug 18 '18 at 01:39
@Taha When you say you're forecasting, what quantity are you predicting? a mean? the whole distribution of the next value? — Glen_b, Aug 18 '18 at 04:04
@Glen_b when I say forecasting I mean fitting a model on the past data, once fitted I can predict the next values ( the mean and residuals and...) — Taha, Aug 18 '18 at 09:02
The *mean* of a future (i.e. random) value will not be discrete; it could be any value within the limits on the variable. Consider, for example, a Poisson distributed random variable (which takes integer values); its population means (the parameter of the Poisson) can be any positive value. As another example, consider predicting the number of spots on a die. While the number itself is 1,2,3,4,5, or 6, the *mean* is generally not integer (for a fair die it's 3.5). You should not discretize a mean forecast! — Glen_b, Aug 18 '18 at 10:53
@Glen_b in that case rounding the results of any model for time series wouldn't be such a bad idea after all ? — Taha, Aug 18 '18 at 11:27
It sounds like you're operating under a serious misunderstanding. If you were predicting a distribution of *future values*, those distributions should be discrete. or, for example if I want a prediction interval for an observation, the discreteness of the observations is a relevant consideration (if I am predicting the next die roll I can get a 66.7% PI by predicting $\{2,3,4,5\}$), but if I am predicting a *mean*, rounding it would betray a confusion between sample space of a variable and the parameter space. It would be ridiculous to predict that the *mean* is "4" -- *means* are not discrete — Glen_b, Aug 18 '18 at 12:03
If you do want to predict a value taken by the variable, perhaps you could predict a mode rather than a mean, but most packages will predict means. — Glen_b, Aug 18 '18 at 12:07
@Glen_b Thanks for the clarification .. The thing is, by rounding the predicted means I was hoping to get the most probable value(the mode) but that seems, by definition of the mean vs mode, wrong. I was confusing those two concepts. Based on what you said, may I conclude, and correct me if I'm wrong, that using models that do not take into consideration the discreteness of data to forecast count time series is fundamentally wrong? — Taha, Aug 18 '18 at 12:37
It depends on what you mean by "fundamentally"; all models are approximations, so it depends on how much impact doing that would have on the particular thing you were predicting. If you were seeking a mean prediction and the count values were all large (far from 0), ignoring the discreteness might not make much difference at all (as long as the other aspects of the distribution were approximately correct). But for small means, it may have more of an impact -- and for predicting a mode it could matter much more. — Glen_b, Aug 18 '18 at 13:24
@Glen_b That's EXACTLY what I wanted to know 'cause the data I'm working with is ,in most cases, composed of large numbers with a very wide sample space and I haven't been able to pinpoint any abnormal results when using ARIMA & ETS models .. I appreciate your help. — Taha, Aug 18 '18 at 20:54
Heteroskedasticity will typically be the main issue; counts often tend to have variance nearly proportional to mean. ARIMA and ETS models generally don't model that kind of heteroskedasticity; this was one of the things encompassed by "other aspects of the distribution" above. [I may summarize this discussion into an answer and remove the comments] — Glen_b, Aug 18 '18 at 23:28

Rob Hyndman · Answer 1 · 2018-08-18T01:24:29.677

5

When you are looking for suitable packages, use the CRAN task views. In this case, the time series task view contains the following line:

Count time series models are handled in the tscount and acp packages. ZIM provides for Zero-Inflated Models for count time series. tsintermittent implements various models for analysing and forecasting intermittent demand time series.

Then see what models are implemented, and check the references. The tscount package has a nice vignette on analysing count time using using GLMs.

As to whether they are more efficient, that depends on the data and what you mean by efficiency. If a count time series model is a good fit, then it will be more efficient (in the statistical sense) to use it. It may not be more computationally efficient depending on how it is coded.

The comments suggested you mean accurate rather than efficient. The only way to answer that is to try it and see.

edited Aug 18 '18 at 01:24

answered Aug 18 '18 at 00:50

Rob Hyndman

51,928
23
126
178

Thanks .. I wasn't aware of such a handy tool .. I'll try and use the packages you suggested. Meanwhile I'd like to hear your opinion on the second part of my question : are those models deliver more accurate results than rounded ARIMA,ETS .. ? – Taha Aug 18 '18 at 01:21
So there is no way to tell which way is better to model a count time series, it depends on each case. One last clarification: mathematically speaking, is it "Okey" to use methods like ARIMA that assumes continuity to forecast count time series ? – Taha Aug 18 '18 at 11:37
2

If your count data is high enough (far enough away from zero) then it is less problematic because you don't run into issues a)with the zero constraint on the lower bound and b)the data will look more continuous as it takes many different integer values. In practice many businesses use ARIMA and ETS to forecast integer/count data. In my own experience we do that and round, though not always back to an integer. – Chris Umphlett Aug 20 '18 at 00:15
@ChrisUmphlett Thanks for sharing your experience .. That confirms what I've experienced when I used ARIMA and ETS to forecast time series such as the population of sheep in England which has wide sample space and values way far from zero .. Furthermore I assume ( with no actual knowledge) that forecasting packages in R that takes into consideration the discontinued feature of time series may be too simple and not as accurate as TBATS ( for example) when it comes to modeling trend and complex seasonal components... – Taha Aug 20 '18 at 15:30

How to forecast integer time series in R?

1 Answers1