Poor fit of an ARIMA model

Question

Well, I'm a very newbie in time series forecasting methods, and I'm trying to fit an ARIMA to my time series data and the result is poor. See figure.

It seems to be stationary and the Dickey-Fuller test gives me p<0.05, so, I tried found what ARMA to use using statsmodels arma_order_select_ic. Don't think is a time series transformation problem, because this results are far from good and when I tried use log (or sqrt) transformation, nothing seems to change in the fitting.

Here is the data and the code I'm using.

ts = pd.read_csv('path/data.csv',index_col=0,parse_dates=True)
ts = pd.Series(ts['ts'])

# # Testing stationarity
# import statsmodels.tsa.stattools as tsa
# dfuller_test = tsa.adfuller(ts, autolag='AIC')
# dfuller_output = pd.Series(dfuller_test[0:4], index=['Test Statistic','p-value','#Lags Used','Number of Observations Used'])
# print(dfuller_output)

# # Plotting ACF and PACF
# from statsmodels.graphics.tsaplots import plot_acf, plot_pacf
# plot_acf(ts,lags=50)
# plot_pacf(ts,lags=50)

# # Finding best p,q
# import statsmodels.api as sm
# res = sm.tsa.arma_order_select_ic(ts, ic=['aic', 'bic'], trend='nc')
# print(res.aic_min_order)

p,d,q = 4,0,1
import pyflux as pf
model = pf.ARIMA(data=pd.DataFrame(ts), ar=p, integ=d, ma=q)
x = model.fit()
model.plot_fit(figsize=(15,4))
mu, Y = model._model(model.latent_variables.get_z_values())
fitted_values = pd.Series(model.link(mu),index=ts.ix[-len(mu):].index)
ts.subtract(fitted_values).plot()

My question is if I'm missing something in this fitting process, or data needs any transformation or normalization? Do you think other model could do it better, as GARCH for instance?

What do you mean by "the result is poor"? What is it you are actually trying to do? Is it forecasting? If so, you need to actually measure out-of-sample forecasting performance to decide if it's "poor"; the chart you have there isn't particularly useful. And no, there is no evidence that you need any kind of transformation, or that GARCH would be useful for your data. — Chris Haug, May 10 '17 at 01:03

Michael L. · Accepted Answer · 2017-05-10T07:31:49.800

4

From my perspective, the time series is stationary and does not exhibit much time-varying variance. I.e. there are no pronounced volatility clusters. Hence, a GARCH model is unlikely to provide further insights, as already mentioned by Chris Haug.

Furthermore, the long term mean is clearly roughly 10 and the noise does not look like containing much autocorrelation. From this graphical analysis I would suggest that the best you can get from the data for forecasting is something like: $$y_t=\mu+u_t, u_t \sim N(0,\sigma^2), $$ whith $\mu \approx 10$ I suggest. This might look too simplistic, but at least it allows you to give some forecast intervall for the long term mean if you estimate $\sigma^2$. On top of that, also the arima model is mean-reverting, i.e. if your forcast horizon increases your forceast also will very quickly converge to $\mu$.

With the inclusion of four AR terms and one MA term you might get trouble with overfitting, leading to a only slightly better fit (which nevertheless is not able to fit the whole of the amplitude) but almost certainly no better forecast model. This may sound dissappointing, but at least it prevents you form reading something in the data that is not there.

Is the data already differenced? If you are aiming to predict and the plot gives the absolute changes you could try to fit the general trend in the non-differenced data.

edited May 10 '17 at 07:31

answered May 10 '17 at 07:25

Michael L.

1,318
6
16

+1, very good answer. The time series may simply contain a lot of residual uncorrelated noise. Related: [Is it unusual for the MEAN to outperform ARIMA?](https://stats.stackexchange.com/q/124955/1352) – Stephan Kolassa May 10 '17 at 07:49
I don't think presence or absence of autocorrelation can be easily identified from a graph like the above; there are many patterns that are deceptive to the eye (inspecting residuals via ACF or PACF would be much more informative than just looking at the actual vs. fitted plot). Thus I would be careful with statements like *the noise does not look like containing much autocorrelation*. Also, the model was selected using information criteria, which offers a ground to believe the model is neither overfit nor underfit (to the degree of approximation). – Richard Hardy May 10 '17 at 08:59
1

I do agree with your statement, that visual inspection is not enough to provide strong evidence for autocorrelation and indeed, inspecting the ACF or PACF is the superior approach but only in combitination with the visual inspection. And, although information criteria where used for the lag-selection it seems far from sure for me that an arima(3,0,1) gives better forecasts then a simple mean model. So from a technical perspective you are totally right, but disregarding the visual impression seems false to me. – Michael L. May 10 '17 at 09:11
@MichaelL. Thanks for the help! No, this data is not differenced, and fit general trend maybe would gives me not very good results in this case. – Adelson Araújo May 10 '17 at 10:15
This answer is correct, it looks like the ARIMA model you have is OK but that the innovation variance is large. Even if your data was generated by an ARIMA model and you knew exactly its parameters, if $\sigma^2$ is large you will get a chart like the one you showed. Something to consider is that your data is integer-valued. If this is an artifact of how it was recorded, probably ARIMA will work ok, but if the data is naturally discrete you may find better performance with different models (for example, replacing $u_t$ in the above with a discrete distribution, not normal). – Chris Haug May 10 '17 at 10:17

Poor fit of an ARIMA model

1 Answers1