Is there justification for using cross validation scores as model averaging weights?

Question

Bayesian model averaging uses approximate Bayes factors. Some researchers use AIC to weight models. Is there justification for using, say, the Brier score, median absolute deviation, or other such scores computed from out of sample predictions to construct weights for model averaging instead of the various information criteria? I played around with this a while back when averaging the results of popular election forecast models.

Correct me if I'm wrong, but I think model weighting is mostly empirical -- there isn't a really solid justification for using AIC in the first place. I would play around with it and see if you get better performance with cross validation scores rather than AIC. — ahwillia, Feb 01 '14 at 22:39
What? You mean I have to have fun experimenting with data?!!! Say it ain't so. Yeah, I will try it out. — Brash Equilibrium, Feb 01 '14 at 23:14
There is by the way some theoretical justification for bayesian model averaging — Brash Equilibrium, Feb 01 '14 at 23:16
Interesting idea. Theoretical support for the gazillions of existing information criteria is of varying quality (see Burnham and Anderson 2002 for a start). I think @AlexWilliams is on target in that the literature often settles for the weighting methods that perform best in simulations. — zkurtz, Feb 02 '14 at 02:02
@BrashEquilibrium -- The theoretical support is for BIC not AIC, as far as I know. The problem is that BIC assumes that one of the candidate models is the **real** model, which is probably not the case in most practical situations. AIC doesn't make this assumption - so people substitute it for BIC. But I think this substitution isn't theoretically grounded, but found to work well in practice. — ahwillia, Feb 02 '14 at 05:19
Anyway, I already see one potential issue, which involves the Brier score in particular: the Brier score does not penalize models that predict an event was 100% certain to either happen or not happen so long as that prediction holds. But the "truth" is more likely to be that those events were somewhat less than 100% likely to either happen or not happen. — Brash Equilibrium, Feb 02 '14 at 05:52
Not sure the assertion of the first sentence holds. Most of my experiences with model averaging have been model selection/variable selection using MCMC, where (assuming convergence and sufficient simulations that we regard the output as essentially exact), the model averaging should correspond to "exact" small sample Bayes factors (though explicit Bayes factors might not actually be computed). The statement is true in the case of using AIC and BIC for BMA (since they're asymptotic) but that's a subset of the cases in which model averaging occurs -- and then there's frequentist model averaging. — Glen_b, Feb 02 '14 at 07:36
From Adrian Raftery: BIC approximates the Bayes factor assuming a unit information prior. https://www.google.com/search?q=Raftery+BIC+approximate+Bayes+factor&rlz=1C1PRFB___US572US572&oq=Raftery+BIC+approximate+Bayes+factor&aqs=chrome..69i57j69i59j69i60l4.3934j0j7&sourceid=chrome&espv=210&es_sm=93&ie=UTF-8 — Brash Equilibrium, Feb 03 '14 at 17:48

Is there justification for using cross validation scores as model averaging weights?

0 Answers0