In linear models averaging across coefficients will give you the same predicted values as the predicted values from averaging across predictions, but conveys more information. Many expositions deal with linear models and therefore average across coefficients.
You can check the equivalence with a bit of linear algebra. Say you have $T$ observations and $N$ predictors. You gather the latter in the $T\times N$ matrix $\mathbf{X}$. You also have $M$ models, each of which assigns a coefficient estimate $\beta_m$ to the $N$ predictors. Stack these coefficient estimates in the $N \times M$ matrix $\mathbf{\beta}$. Averaging means that you assign weights $w_m$ to each model $m$ (weights are typically non-negative and sum up to one). Put these weights in the vector $\mathbf{w}$ of length $M$.
Predicted values for each model are given by $\mathbf{\hat{y}}_m = \mathbf{X}\beta_m$, or, in the stacked notation
$$
\mathbf{\hat{y}} = \mathbf{X}\mathbf{\beta}
$$
Predicted values from averaging across predictions are given by
$$
\mathbf{\hat{y}} \mathbf{w} = (\mathbf{X}\mathbf{\beta})\mathbf{w}
$$
When you average across coefficient estimates instead, you compute
$$
\mathbf{\beta}_w = \mathbf{\beta}\mathbf{w}
$$
And the predicted values from the averaging coefficients are given by
$$
\mathbf{X\beta}_w = \mathbf{X}(\mathbf{\beta}\mathbf{w})
$$
Equivalence between the predicted values for either approach follows from the associativeness of the matrix product. Since the predicted values are the same, you may as well just compute the average of the coefficients: this gives you more information, in case you e.g. want to look at coefficients for individual predictors.
In non-linear models, the equivalence typically does not hold anymore and there, it makes indeed sense to average across predictions instead. The vast literature on averaging across predictions (forecast combinations) is for instance summarized here.