What is the best way to extrapolate when working with a linear regression model?

Question

There's not much more to ask than what I've written in the title.

Some of the values I want to predict are outside of the range used to build the regression model.

Can you please define "best"? It would be "best" to collect further data and thereby extent the range of training data so that you don't need to extrapolate. — Roland, Aug 08 '16 at 09:18
Sorry about that. The best computational method to extrapolate values that are outside of the given range. — madsthaks, Aug 08 '16 at 17:59

score 3 · Answer 1 · answered Aug 08 '16 at 08:19

3

You can use the predict function. Try:

set.seed(123)

x <- 1:10
y <- -2 + 3 * x + rnorm(10)
our_data <- data.frame(y = y, x = x)
our_model <- lm(y ~ x, data = our_data)

predict(our_model, newdata =  data.frame(x = 20))

answered Aug 08 '16 at 08:19

Qaswed

578
4
17

2

I believe the OP's concern is not with *evaluation* of the values but with the *extrapolation* involved in some cases. – whuber Aug 08 '16 at 14:22
2

I was under the impression that the predict function should not be used outside of the range used to build the regression model. – madsthaks Aug 08 '16 at 18:11
@user3552144 There is no such limitation in the `predict.lm` method. The method even provides the option of returning the prediction interval mentioned by whuber. Study `help("predict.lm")`. – Roland Aug 09 '16 at 13:26

score 1 · Answer 2 · answered Aug 08 '16 at 05:17

1

Once your model and its parameters are fixed, there's only one way to do it: plug in the covariate values of the point you want to extrapolate at.

answered Aug 08 '16 at 05:17

Kodiologist

19,063
2
36
68

2

I would like to suggest that a good *statistical* answer would also provide information about how to assess the uncertainty in the extrapolation. That would address the implicit concern associated with extrapolation. – whuber Aug 08 '16 at 14:21
@whuber Fair enough request, but I'm not familiar with model-validation methods for extrapolation, only for a population that the training data is representative of. – Kodiologist Aug 08 '16 at 15:04
1

One aspect you could readily point to is the [formula for a prediction interval](http://stats.stackexchange.com/questions/33433) (or the better-known formula for a confidence interval of the fit) and the fact that as the regressors move away from their centroid, either interval expands quadratically. That provides a quantitative way to assess *how much* extrapolation is occurring and what its effects are on the uncertainty in the prediction or fit. – whuber Aug 08 '16 at 15:10
It is 'A' method but it is not correct to say it is the 'only' method. Using 'predict' as described by Qaswed would have to be simpler, especially when the model involved multiple co-efficients. – dra_red Sep 20 '19 at 13:24
@dra_red But this is exactly what `predict` does. – Kodiologist Sep 20 '19 at 17:58
Yes, Kodiologist, that is why it makes sense to use 'predict'...or are you saying something else? – dra_red Sep 25 '19 at 06:45
@dra_red That is what I'm saying. – Kodiologist Sep 25 '19 at 11:21

What is the best way to extrapolate when working with a linear regression model?

2 Answers2