How can a distribution of cross-validated $R^2$ scores be used to determine whether one model is significantly better than another?

Question

I have two models, A and B. I have performed 10-fold cross-validation on both of them, so now I have 10 $R^2$ scores for each.

How can I determine whether one is significantly better than the other? I fear that calling $A$ the winner iff $$\text{mean}(A)-\text{SEM}(A) > \text{mean}(B)+\text{SEM}(B)$$ is perhaps not the correct way to do it.

score 0 · Answer 1 · edited Apr 13 '17 at 12:44

Well, assuming all of your modelling assumptions are fulfilled, the modelling techniques do not differ and also assuming $R^2$ is the only model characteristic that differs between the models, I would consider a Student’s t-test (assumptions permitting) or Wilcoxon-Mann-Whitney test on the $R^2$s. In any case I would try to get a larger sample (costs permitting).

You can read more about different model selection measures, particularly AIC, here.

If different modelling techniques were used you may find this interesting.

How can a distribution of cross-validated $R^2$ scores be used to determine whether one model is significantly better than another?

1 Answers1