Combining the conclusions from multiple Diebold-Mariano tests

Question

Suppose I have two forecasting models, A and B. Moreover, suppose I have multiple datasets and I use each of these sets to perform Diebold-Mariano tests. My aim is to find out which of the two models is best overall. So, I perform these tests and get a p-value for each.

What is the right method of combining the conclusions of each of my tests into an overall conclusion? More specifically, what is wrong (if anything) with the following two simple methods:

Sum the z-scores across all tests and divide by the number of tests.
Sum the p-scores across all tests and divide by the number of tests.

Intuitively, 1 seems better to me since it does not work directly on probabilities; the average would finally be converted to an overall p-value. But it also seems to me to be missing something, namely that greater frequencies of extreme observations should provide some kind of "bonus".

It may be useful to look into meta analysis, e.g. https://stats.stackexchange.com/questions/243003/can-a-meta-analysis-of-studies-which-are-all-not-statistically-signficant-lead — Christoph Hanck, Jan 29 '19 at 13:00
And although @ChristophHanck is too modest to say so his answer there should give you insight into the literature on combining p-values. — mdewey, Jan 29 '19 at 18:17

score 2 · Accepted Answer · answered Jan 30 '19 at 13:36

The first method you suggest is similar to Stouffer's method

Assume we have $k$ studies each giving rise to a $p_i$ and $z()$ is the normal deviate. Stouffer's method computes

$$\frac{\sum_{i=1}^k z(p_i)}{\sqrt{k}}$$

Which gives a normal deviate $z(\alpha)$ Note the division is not by $k$

The second method you suggest is similar to Edgington's method

$$\frac{(S)^k}{k!} - {k - 1 \choose 1}\frac{(S - 1)^k}{k!} + {k - 2 \choose 2}\frac{(S - 2)^k}{k!} - \dots$$

where $S = \sum_{i=1}^k p_i$ and the summation continues until the term in the numerator $(S-i)$ goes negative. As can be seen this relies on alternating large positive and negative numbers so be careful if you choose to program it yourself.

There are a number of other methods of which the most popular is Fisher's method

$$\sum_{i=1}^{k} - 2 \log p_i$$

which is referred to a $\chi_{2k}(\alpha)$.

Combining the conclusions from multiple Diebold-Mariano tests

1 Answers1