How to choose the order for ARMA(p,q) to avoid residual autocorrelation?

Question

I tried to process the daily return data for Shanghai Stock Index in RStudio. I use ARMA to model my data.Below is the ACF and PACF figure from R. How should I choose the order for my data according to these figures? I have tried several combinations but every time the result of JB test for the residuals show that there are still high-order autocorrelations.

Here are some of my codes and results:

stock_arma1 <- arima(data1[,2],order=c(4,0,2))
for (i in 10:20){show(i);show(Box.test (stock_arma1$residuals, lag = i, type = "Ljung"))}

There are tons of similar questions here on Cross Validated. Please search and see if the existing answers are sufficient. — Richard Hardy, Mar 17 '17 at 06:30
Yeah, I have already learned from previous answers but the problem is no mater which order I choose for ARMA, the residuals still show high-order autocorrelation (reject Box-test when lag order is larger than 20 ) . — cubaogu, Mar 18 '17 at 01:45

score 1 · Answer 1 · edited Apr 13 '17 at 12:44

Some pointers:

1) Regarding your data set: You say that you model the returns, i.e. the changes in index data. Why do you want to do this with an ARMA model? An ARMA model might be needed, if you decided to model the actual index value - and even then you don't account for heteroscedasticity that is likely to be present in your data.
Stock/Index returns typically exhibit volatility clustering/heteroscedasticity, heavy tails/high kurtosis and leverage effect (i.e. negative shocks have higher influence on volatility than positive ones). To account for this, I would rather suggest to use an EGARCH, GJR-GARCH or T-GARCH model (all contained in the "rugarch" package in R) with student's t-distributed errors. If you want to model the actual index data, then you can simultaneously use an ARMA model in the "ugarchspec" specification. If you model the returns, I don't think that this will be necessary.
An alternative to said models would be the Beta-t-EGARCH model ("tegarch" package in R) that accounts specifically for the heavy tails.

2) Regarding the test: Did you choose the lag order according to your sample size? An appropriate lag order (according to Tsay(2005) ) would be about $log(n)$.
Also, some suggest that the ljung-box test is inappropriate for autoregressive time series. It might be sensible to use another, like the Breusch-Godfrey test.
Testing for autocorrelation: Ljung-Box versus Breusch-Godfrey

Thx Eldioo for your comment. 1.As return is just like the first-order difference of the index, I think use ARMA as the mean-equation is also OK. For the GARCH perspective, this is just what I want to do next for the volitility equation. 2. I am really excited about the “tegarch” package you mentioned and the models inside open a new door for me. >< 3. Your suggestion about the lag-order log(n) (Tsay(2005)) indeed helped me a lot as I find it worked well for my dataset. 4.I still have a puzlle about the test: why sometimes one reject the autocorrelation but another don't ? — cubaogu, Mar 20 '17 at 01:39
No problem. Regarding 4: The test's null hypothesis is a joint one, i.e. you test whether ALL of them are zero up to the chosen lag order $k$: $\mathcal{H}_0:\rho_1=\dots=\rho_k=0$. In particular, you aren't checking, whether they are insignificant INDIVIDUALLY. You can see in the test statistic that the sample autocorrelations are summed up - if e.g. only the first lag order is significant, but you choose $k=20$, you might reject the joint hypothesis, even though one lag order is highly significant. On the other hand, an order that is too low might miss some correlations of higher order. — Eldioo, Mar 20 '17 at 11:00

How to choose the order for ARMA(p,q) to avoid residual autocorrelation?

1 Answers1