How to test if two RMSE are significantly different?

Question

Say I have two models for a regression task and from each model I get a RMSE. One RMSE is smaller than the other, however I wish to test if the difference is statistically significant in order to be able to say that one model is better than the other. How can I do it?

Are the models based on the same data with exactly the same response variable? Please search our site (or anywhere else) for "overfitting" for some important considerations that will show you cannot compare models just by comparing RMSEs. — whuber, Aug 30 '18 at 19:18
Both models are trained with the same data available for both. I’ve taken certain measures to avoid overfitting (like splitting the data in training and testing subsets, and dos kg cross validation, since the dataset is small). — Lay González, Aug 30 '18 at 19:27
AIC or BIC or even R^2 are used for comparing models based on same data. However, you should forget about testing for statistical significance of the difference between RMSE of the two models, that are based on the same data. This just makes no sense. — Rodolphe, Sep 02 '18 at 15:12

score 6 · Accepted Answer · answered Aug 30 '18 at 19:35

To test whether two (root) mean squared prediction errors are significantly different, the standard test is the Diebold-Mariano test (Diebold & Mariano, 1995, Journal of Business and Econonomic Statistics). We have a diebold-mariano tag, which may be useful. I also recommend Diebold's (2015, Journal of Business and Econonomic Statistics) personal perspective on uses and abuses twenty years later.

How to test if two RMSE are significantly different?

1 Answers1

Linked