2

I tried auto.arima on my time series and the results converge to mean. Why does this happen? What can I do to correct the error in modelling?

Here is my time series

y <- c(1, 1, 2, 4, 1, 4, 3, 3, 3, 2, 0, 0, 2, 1, 2, 1, 5, 4, 3, 8, 10, 3, 5, 3, 3, 4, 0, 1, 4, 5, 4, 7, 7, 2, 8, 3, 3, 6, 8, 3, 2, 2, 1, 4, 6, 6, 3, 6, 9, 0, 2, 5, 1, 4, 2, 1, 0, 1, 3, 1)

The model suggested by auto.arima is ARIMA(1,0,0) with non-zero mean

Firebug
  • 15,262
  • 5
  • 60
  • 127
Sid Verma
  • 101
  • 2
  • 9

2 Answers2

4

You are doing nothing wrong, and there is no error in modeling. (At least the forecasts imply none.)

An ARIMA(1,0,0) model with non-zero mean is AR(1) with a mean:

$$ y_t = c+\phi y_{t-1}+\epsilon_t $$

with $|\phi|<1$ for stationarity. In your particular case, auto.arima() estimates $\hat{c}=3.27$ and $\hat{\phi}=0.30$.

If $y_t$ is your last observation, the mean point forecast one step ahead can be obtained by plugging this in and setting the innovation $\epsilon_{t+1}$ to zero:

$$ \hat{y}_{t+1} = \hat{c}+\hat{\phi}y_t. $$

For the two-step ahead forecast, we simply plug this in:

$$ \hat{y}_{t+2} = \hat{c}+\hat{\phi}\hat{y}_{t+1} = \hat{c}+\hat{\phi}(\hat{c}+\hat{\phi}y_t). $$

Iterating this, you get

$$ \hat{y}_{t+h} = \hat{c}\sum_{k=1}^{h-1}\hat{\phi}^k + \hat{\phi}^hy_t. $$

Now, if you let $h\to\infty$, then the first term is a geometric series, and the second term goes to zero, since $|\hat{\phi}|<1$, so

$$ \hat{y}_{t+h}\to \frac{\hat{c}}{1-\hat{\phi}}. $$

This is quite exactly what should happen.

So, if an ARIMA(1,0,0) model with nonzero mean is appropriate, then the forecasts make complete sense.


Contra Alexey, I don't think non-stationarity is a problem here - you don't really have all that much data, and even if the ADF test believes that there is nonstationarity involved, I don't see it, and the estimated $\hat{\phi}$ is quite far away from one.

Frankly, if at all, I'd be more worried about the fact that your data are all integer, whereas ARIMA really assumes normally distributed innovations. But yet again, I don't think this is a dealbreaker here.

The fact is that you have rather little data, and so I am less concerned about the long-term convergence to the mean, than with the convergence leading up to it. I'm skeptical about the ramp-up. I would recommend you benchmark auto.arima() against some very simple approaches, like the historical mean or the historical median, which may well outperform ARIMA. You may also find this earlier thread useful.

Stephan Kolassa
  • 95,027
  • 13
  • 197
  • 357
0

What can I do to correct the error in modelling?

> tseries::adf.test(y, alternative = "stationary")

    Augmented Dickey-Fuller Test

data:  y
Dickey-Fuller = -2.754, Lag order = 3, p-value = 0.27
alternative hypothesis: stationary

y is not stationary. Try d = 1.

y_diffed <- diff(y)
> aaa <- adf.test(y_diffed, alternative = "stationary") Warning message: In adf.test(y_diffed, alternative = "stationary") :   p-value smaller than printed p-value
> aaa

    Augmented Dickey-Fuller Test

data:  y_diffed Dickey-Fuller = -4.5997, Lag order = 3, p-value = 0.01 alternative hypothesis: stationary


> acf(y_diffed)
> pacf(y_diffed)

The term p = 1 seems valid, but it bay be 2 as well. Try it.

> t.test(y)

    One Sample t-test

data:  y
t = 10.577, df = 59, p-value = 3.016e-15
alternative hypothesis: true mean is not equal to 0
95 percent confidence interval:
 2.675687 3.924313
sample estimates:
mean of x 
      3.3 

Data mean is more than zero, while differenced data has exactly zero mean:

> t.test(y_diffed)

    One Sample t-test

data:  y_diffed
t = 0, df = 58, p-value = 1
alternative hypothesis: true mean is not equal to 0
95 percent confidence interval:
 -0.7481294  0.7481294
sample estimates:
mean of x 
        0 

After that study the residuals of the model to understand what q might be.

Alexey Burnakov
  • 2,469
  • 11
  • 23
  • I appreciate your quick response @Alexey! Do think auto.arima() is not working correctly? – Sid Verma Oct 31 '17 at 12:10
  • @Sid Verma, **auto.arima()** usually works correctly. In your case I think taking first differences does not hurt at least, and moreover the **acf** function gives more clear picture using the differenced data. However your model seems OK, but I did not check the residuals. – Alexey Burnakov Oct 31 '17 at 13:23
  • @Stephan Kolassa, taking differences will not hurt at all. Besides, I could see a more clear **acf** picture of differenced data. I am after manual checks in such small tasks. – Alexey Burnakov Oct 31 '17 at 13:25