How to test superior predictive ability over multiple time series?

Question

Suppose you have two models, model A and model B, and let these models forecast 10 time series over a horizon of 12 periods. That is, suppose the time series contain monthly data and your forecasting horizon is 1 year.

Can I then statistically test whether model A is 'better' than model B over all the time series? I know that the Diebold-Mariano test can be used to see if the forecasts of model A and model B are statistically different for one specific time series. However, I don't think that this method can be directly generalized to multiple time series, correct?

Richard Hardy · Accepted Answer · 2020-06-09T19:48:07.087

3

A lot depends on the precise formulation of the null hypothesis you would like to test. You could formulate a hypothesis such as

$H_0\colon$ model $A$ and model $B$ have equal expected forecast loss for each of the 10 time series

against an alternative

$H_1\colon$ $H_0$ is not true

and test $H_0$ using a series of vanilla Diebold-Mariano tests with a correction of each test's significance level to account for multiple testing. I suppose this is not the only formulation of a reasonably interesting/relevant null hypothesis that has a feasible solution.

Also, note that the Diebold-Mariano test is intended for accuracy of predictions, not the underlying models. There are some subtle differences between these concepts and there is an entire strand of literature focusing on them and the different variations of tests. See e.g. Diebold "Comparing predictive accuracy, twenty years later: A personal perspective on the use and abuse of Diebold–Mariano tests" (2015).

edited Jun 09 '20 at 19:48

answered Jun 09 '20 at 19:43

Richard Hardy

54,375
10
95
219

I read the article but I don't fully understand how your suggestion works. How could you combine a series of vanilla DM tests? And how would you include a correction? – Whizkid95 Jun 12 '20 at 08:09
1

@Whizkid95, Diebold's paper does not consider multiple testing, as far as I can remember. But this is a much more general topic with material available in introductory statistics textbooks and such. See e.g. [a lecture note from Berkeley](https://www.stat.berkeley.edu/~mgoldman/Section0402.pdf) (this is just the first hit, you will find many more by searching online) and [questions](https://stats.stackexchange.com/questions/tagged/multiple-comparisons?tab=Votes) tagged with the [tag:multiple-comparisons] tag. – Richard Hardy Jun 12 '20 at 08:34

score 1 · Answer 2 · answered Jun 10 '20 at 23:10

Well you can, and this is often done in practice, do something like a MAPE with a hold out data set and see which works best, but there is no rule I know of if one predict some data sets better and the other predicts other data sets better. Nor is this a formal statistical test. You might consider how the various M contest have addressed this (I think they use AIC or something like it, but that is not a formal test either).

How to test superior predictive ability over multiple time series?

2 Answers2

Linked