Can I use bootstrapping to improve prediction error of simple linear model?

Question

I'm learning about ensemble method (aka bootstrapping?). Specifically I'm using the statisticalModeling package but I'm not sure that matters. Here is some code:

library(statisticalModeling)

lm_mtcars <- lm(
  mpg ~ cyl + hp,
  data = mtcars
)

If I look at summary(lm_mtcars) I see a residual standard error of 3.17.

I learned about the ensemble functions of the statisticalModeling() package which generates nreps new models based on nreps bootstrap samples:

ensemble_lm_mtcars <- statisticalModeling::ensemble(lm_mtcars, nreps = 100, data = mtcars)

This ensemble_lm_mtcars variable appears to be made up of 5 parts, see screen shot of my console:

I understand what these are, I tested by typing them into the console and hitting enter. Presumably the "core" of this object is the replications.

I'm confused because I don't know what to do with this object now that I have created it. Presumably I can use it to try to improve my model accuracy, but how?

I Googled "Why use bootstrapping?" and serp page gave me this Wikipedia excerpt:

Bootstrapping allows assigning measures of accuracy (defined in terms of bias, variance, confidence intervals, prediction error or some other such measure) to sample estimates. This technique allows estimation of the sampling distribution of almost any statistic using random sampling methods.

OK... how? For example, how can I use ensemble_lm_mtcars to improve prediction error?

score 3 · Accepted Answer · answered Nov 04 '17 at 20:30

3

Bootstrapping is averaging multiple models trained on slightly different samples. But average of many linear models is also a linear model. This means that your linear model will generally not improve if you bootstrap it.

For linear models, bootstrapping is usually applied only to measure uncertainty in coefficients, but not to improve prediction error. With some other types of models, like trees, bootstrapping generally results in better prediction.

answered Nov 04 '17 at 20:30

David Dale

2,191
8
18

Thanks for the info. "For linear models, bootstrapping is usually applied only to measure uncertainty in coefficients". What does that mean exactly? – Doug Fir Nov 04 '17 at 20:34
1

@DougFir You simply re-run your model on many bootstrapped samples, and remember its coefficients. As a result you get distribution of coefficients, and may use statistics of this distribution to test hypotheses or just desribe your results. See this questions for example. https://stats.stackexchange.com/questions/64813/two-ways-of-using-bootstrap-to-estimate-the-confidence-interval-of-coefficients and https://stackoverflow.com/questions/23563836/bootstrap-a-linear-regression – David Dale Nov 04 '17 at 21:37

Can I use bootstrapping to improve prediction error of simple linear model?

1 Answers1