Evaluate forecasting ability of GARCH models with RMSE and MAE

Question

I am evaluating different forecasting models and their ability to forecast index volatility during period of market turmoil, using two measurements, Root Mean Square Error and Mean Absolute Error. For the previous evaluated models, ARMA (1,1) as an example, I was able to obtain the residuals and calculate the RMSE quite easily in Stata.

When estimating the GARCH (1,1) model in Stata I am however not able to correctly obtain the residuals in the post estimation procedure, no option to directly obtain the RMSE is available. Perhaps I have misunderstood how one should evaluate the forecasting ability of GARCH models, since the models specifies the conditional variance unlike ARMA which specifies the conditional mean.

Does anyone have a suggestion on how to obtain these evaluation measurements after estimating a GARCH model? And preferably how to do it in Stata.

The problem with such error metrics is that volatility is not an observable thing, and it depends on the model. Hence the error is not something well defined (compare what with what?). Instead try comparing the likehoods of each model with the test samples that are not used during the estimation procedure. — Cagdas Ozgenc, Apr 05 '16 at 08:21
@CagdasOzgenc: +1, except for a minor quibble: I wouldn't say that the volatility depends on the *model* (which is something we use to describe a process), but on the data-generating process. — Stephan Kolassa, Apr 05 '16 at 08:56
@StephanKolassa I think what you are saying is a little philosophical. Because there is no single definition of volatility, it can be caused by jumps, quadratic variation, etc. But all these concepts are invented by humans hence take roots from our abstraction, which leads to a model view rather than a true DGP view. True DGP is probably not comprehensible to humans, but only to AI. — Cagdas Ozgenc, Apr 05 '16 at 10:44

score 4 · Answer 1 · edited Apr 13 '17 at 12:44

As @Cagdas Ozgenc writes, the problem is that GARCH does not forecast future realizations (which you can observe), but future volatility (which you cannot observe). Thus, classical point forecast error (or accuracy) measures don't make sense.

So, how do we evaluate a GARCH volatility forecast? In fact, one usually not only forecasts volatility using GARCH, but adds distributional assumptions (typically a normal or a t distribution) and outputs a density forecast. The question now becomes how to evaluate a density forecast.

The classical way of evaluating a density forecast is to calculate its Probability Integral Transform, plot a histogram and check whether the PIT is uniformly distributed. Diebold, Gunther & Tay (1998, International Economic Review) is the classical reference - note that they give a very nice example using t-GARCH processes. Tay & Wallis (2000, Journal of Forecasting) is a somewhat newer overview.

However, recent research has focused on the shortcomings of the PIT. It turns out that systematically wrong forecasts can still give uniform histograms. Gneiting, Balabdaoui & Raftery (2007, JRSS B) give some disconcerting examples and propose scoring rules as a remedy. These are less intuitive than the PIT, but they simultaneously evaluate calibration and sharpness of predictive distributions. Gneiting & Katzfuss (2014, Annual Review of Statistics and Its Application) give a more up-to-date overview of density forecasting and evaluation.

Some references that are well known in the GARCH literature: Patton & Sheppard ["Evaluating Volatility and Correlation Forecasts"](https://link.springer.com/chapter/10.1007/978-3-540-71297-8_36) (2009) and Patton ["Volatility forecast comparison using imperfect volatility proxies"](https://econpapers.repec.org/paper/utsrpaper/175.htm) (2011). — Richard Hardy, May 16 '21 at 17:12
+1: @Stephan Kolassa: Does this question/answer refer only to pure GARCH models? I am guessing things are different in case of an ARMA-GARCH model because we would have observable forecasts and in those cases we could use RMSE and MAE. Am I understanding this correclty? — ColorStatistics, Sep 22 '21 at 22:26
@ColorStatistics: yes, you could. But of course the *point* forecasts that the RMSE and MAE evaluate are mainly driven by the ARMA component, and the GARCH influence on the point forecasts is much less. (I have little experience with ARMA-GARCH - does the GARCH even have *any* impact on the point forecasts, or is it only a secondary model for the volatility? In that case, I wouldn't say that we are evaluating a GARCH forecast when we apply RMSE/MAE to ARMA-GARCH.) — Stephan Kolassa, Sep 23 '21 at 07:00

Evaluate forecasting ability of GARCH models with RMSE and MAE

1 Answers1

Linked

Related