0

I tried fitting an ARMA(1,1)/GARCH(1,1) model to my data consisting of around 5000 data points but I got significant results in Ljung Box test on standardized residuals and squared residuals. However when I used only the last 3000 data points the model showed much better results with non-significant standardized residuals and squared residuals.

My question is why is this the case?Isn't more data supposed to give better models?If not what is the optimal sample size?

Also please see my unanswered question: Procedure for fitting an ARMA/GARCH Model

ankc
  • 799
  • 2
  • 8
  • 21
  • "optimal" with respect to what criterion? – Glen_b Nov 29 '13 at 07:20
  • I mean to get a good fit, basically I want to get a good model for my data and might need to adjust my sample size for that. – ankc Nov 29 '13 at 07:33
  • Uh, 'good' and 'optimal' are quite different things. Okay, what, for you, constitutes 'good' in this context? – Glen_b Nov 29 '13 at 08:24
  • hmm as long as I can get the standardized squared residuals to exhibit no correlation I would consider it a good model. – ankc Nov 29 '13 at 08:54
  • 3
    @ankc: Reducing the sample size doesn't fix any deficiencies in your model, but only hides them. Why would you want to do that? – Scortchi - Reinstate Monica Nov 29 '13 at 09:33
  • My guess is that there are structural changes in the way the data behave over time and it would be better to only include data which behave in the same way. What kind of diagnostic tests are applied to a GARCH fit?Would you give priority to AIC or uncorrelated standardized squared residuals? – ankc Nov 29 '13 at 10:11
  • I tried using the following code http://www.quintuitive.com/2013/03/24/automatic-armagarch-selection-in-parallel/ to search for the best model based on AIC but even with the best model my standardized squared residuals still exhibit some correlation – ankc Nov 29 '13 at 10:16
  • @Scortchi, can you answer the above? – ankc Nov 29 '13 at 16:20
  • (1) There certainly could be structural changes but look for them, don't guess. My point was that what you've described is what you'd expect even if there aren't structural changes. (2) AIC is comparing the fit of different models & adjusting for complexity to avoid over-fitting, whereas Ljung-Box is assessing lack of fit in one respect for a single model. So they're quite different things. If you keep adding lots of unnecessary parameters the L-B statistic will fall, but the AIC will climb. Or the better of two models by AIC can still have significant lack of fit by the L-B test. – Scortchi - Reinstate Monica Nov 29 '13 at 16:56
  • @Scortchi,I will be using a single model so the LB should be the relevant one right? I was using oil return data from 1990 to 2013 and my ARMA(1,1)/GARCH(1,1) had significant lack of fit according to LB statistic, p-values were below 5%. At what level of p-value do we reject the null hypothesis? the p-values increase when I took data from 2002 to 2013, what does this suggest? – ankc Nov 29 '13 at 17:23
  • @Scortchi, can you trying answering the above question? – ankc Nov 29 '13 at 18:30
  • Is there any way to identify structural changes? – ankc Nov 29 '13 at 20:38

1 Answers1

3

All models are imperfect representations of reality: the more data you have, the better able you are to detect their imperfections and to take them into account by building better models. So you should expect any kind of goodness-of-fit test to become significant when you increase the sample size enough. You have the choice of deciding that the model performs well enough as it is or of making it more complex to accommodate those previously indiscernible discrepancies.

In this case you might want to first examine carefully the extra 2,000 observations to look for outliers, change-points, &c., then try a model with more GARCH/ARMA parameters as indicated by the auto-correlation functions.

Scortchi - Reinstate Monica
  • 27,560
  • 8
  • 81
  • 248
  • that's what I thought but it seems my model is worse off having 5000 data points than 3000. – ankc Nov 29 '13 at 07:34
  • I tried fitting an ARMA(1,1)/GARCH(1,1) and got the following message Warning message: In arima(.series$x, order = c(u, 0, v), include.mean = include.mean) : possible convergence problem: optim gave code = 1, I can fit an ARMA(0,0)/GARCH(1,1) perfectly fine but don't know what's wrong with the former. Can someone tell me why there is this message? – ankc Nov 29 '13 at 08:15
  • @ankc: At a wild guess it's outliers. This is a different question, & one probably better suited to Stack Overflow, R-help, or a more specific software support site. You need to explain the software you're using, including packages (`rugarch`?), the call, & if possible a reproducible example. – Scortchi - Reinstate Monica Nov 29 '13 at 09:30
  • I'm using R and fGarch package – ankc Nov 29 '13 at 10:13
  • @Scortchi: stackoverflow doesn't have an 'outlier' tag. This site does. Why do you think this problem (to the extend that it is caused by outliers) will be better answered there? – user603 Nov 29 '13 at 12:30
  • I just meant that there are better places than Cross Validated for asking what software error messages mean. Once you know what it means, CV might well be the best place to ask what to do about it. – Scortchi - Reinstate Monica Nov 29 '13 at 13:14
  • but for a GARCH model to perform well, the standardised squared residuals should exhibit no correlation right?that's the main problem I'm facing. – ankc Nov 29 '13 at 16:22
  • (1) If your fitting algorithm isn't converging that's likely your main problem. (2) For a GARCH model, or any other, to perform well it must make good predictions. Statistically significant correlation in the standardized squared residuals may be tolerable if it's tiny. If you want to improve the model look for signs of mis-specification & consider more parameters for the GARCH part. – Scortchi - Reinstate Monica Nov 29 '13 at 16:38
  • @Scortchi, what kind of data can prevent the fitting algorithm from converging?I am using only one model so how can I know if my model is making good predictions?as I have nothing to compare against. – ankc Nov 29 '13 at 17:30
  • First I'm not familiar with `fGarch`; whichever function you're using seems to be calling `optim`, but with what algorithm I don't know (& if I did I'd only suggest changing it on the off-chance that it helped). It could be some trivial oversight about the syntax or starting values. Second, if it is the data, big outliers, jumps - some kind of mis-specification - could be to blame. If I were you, & couldn't see anything obvious like that, I'd post this as a separate question, giving enough detail for there to be a chance of someone's being able to give a definite answer. – Scortchi - Reinstate Monica Nov 29 '13 at 18:06
  • And I'd probably think of R-help as a first port of call. – Scortchi - Reinstate Monica Nov 29 '13 at 18:08
  • I don't know any function called optim, I use the one called garchFit, btw I'm still getting my coefficients etc, it's just that this message appear when the calculations are done. – ankc Nov 29 '13 at 18:29