10

I have a basic question regarding approaches to model averaging using IT criteria to weight models within a candidate set.

Most sources that I have read on model averaging advocate averaging the parameter coefficient estimates based on model weights (either using a 'natural-average' or else a 'zero average' method). However, I was under the impression that averaging and weighting each model's predictions, rather than the parameter coefficient estimates, based on model weights is a more straightforward and justified approach, particularly if comparing models with non-nested predictor variables.

Is there clear guidance on which approach to model averaging is best justified (averaging weighted parameter estimates vs. weighted predictions)? Also, are there further complications with model averaging of the coefficient estimates in the case of mixed models?

Richard Hardy
  • 54,375
  • 10
  • 95
  • 219
John Stella
  • 101
  • 5
  • Both approaches are possible. The major advantage of averaging predictions is that you can average over *any* kinds of models. – Tim Jul 12 '17 at 09:22
  • Maybe of interest: "Model averaging in ecology: a review of Bayesian, information-theoretic and tactical approaches for predictive inference" https://esajournals.onlinelibrary.wiley.com/doi/10.1002/ecm.1309 – Florian Hartig May 05 '18 at 09:23

1 Answers1

4

In linear models averaging across coefficients will give you the same predicted values as the predicted values from averaging across predictions, but conveys more information. Many expositions deal with linear models and therefore average across coefficients.

You can check the equivalence with a bit of linear algebra. Say you have $T$ observations and $N$ predictors. You gather the latter in the $T\times N$ matrix $\mathbf{X}$. You also have $M$ models, each of which assigns a coefficient estimate $\beta_m$ to the $N$ predictors. Stack these coefficient estimates in the $N \times M$ matrix $\mathbf{\beta}$. Averaging means that you assign weights $w_m$ to each model $m$ (weights are typically non-negative and sum up to one). Put these weights in the vector $\mathbf{w}$ of length $M$.

Predicted values for each model are given by $\mathbf{\hat{y}}_m = \mathbf{X}\beta_m$, or, in the stacked notation $$ \mathbf{\hat{y}} = \mathbf{X}\mathbf{\beta} $$ Predicted values from averaging across predictions are given by $$ \mathbf{\hat{y}} \mathbf{w} = (\mathbf{X}\mathbf{\beta})\mathbf{w} $$ When you average across coefficient estimates instead, you compute $$ \mathbf{\beta}_w = \mathbf{\beta}\mathbf{w} $$ And the predicted values from the averaging coefficients are given by $$ \mathbf{X\beta}_w = \mathbf{X}(\mathbf{\beta}\mathbf{w}) $$ Equivalence between the predicted values for either approach follows from the associativeness of the matrix product. Since the predicted values are the same, you may as well just compute the average of the coefficients: this gives you more information, in case you e.g. want to look at coefficients for individual predictors.

In non-linear models, the equivalence typically does not hold anymore and there, it makes indeed sense to average across predictions instead. The vast literature on averaging across predictions (forecast combinations) is for instance summarized here.