2

I recall reading that this is a no-no. Makes sense, b/c the pattern/relationship may dramatically change outside of the interval. My question is: Is there a rule for this convention? Interpolation vs. extrapolation?

Like if you have a correlation between auto price to auto mileage, and the range of miles is within 0 to 100k miles, you should not try to estimate the cost of a car with 120k miles. Just estimate cars with 0 to 100k miles.

Is "extrapolation" totally invalid, or simply done with a disclaimer?

JackOfAll
  • 2,597
  • 6
  • 20
  • 16
  • 2
    In practice sometimes there is no alternative but to extrapolate outside the data, but it requires very strong assumptions (that a model will continue to be equally adequate over the wider range), and even so is subject to a rapid expansion of confidence intervals. The model may be badly wrong and there's no data to check the assumption. However, in some cases you can be *sure* that extrapolation must eventually be wrong -- and your example is a good one. If you fit a linear model where price is decreasing with mileage, eventually there's a predicted mileage where the price is negative! – Glen_b Feb 11 '14 at 22:07
  • 1
    In several answers I have quoted my favorite commentary about the limits of extrapolation (by Mark Twain). See http://stats.stackexchange.com/a/24649, for instance (around the middle of the post). – whuber Feb 11 '14 at 22:20

1 Answers1

3

"totally invalid"? Well, it depends on the situation.

In some cases, substantive theory indicates that there is a linear relationship between two variables (or a quadratic one, or whatever) and regression is used to confirm that relationship and make its estimates precise. Here, some extrapolation may be OK

How far out of the range of the data are you extrapolating? If you have data on (say) the heights and weights of adult men, and your tallest man is 6'5" and you extrapolate to 6'7" you are probably OK. But you may be off if you extrapolate to the tallest people on Earth.

How strong is the relationship? A relationship where all the data are close to a line is going to be easier to extrapolate from than one where the data are only approximately on the line.

How much of a range of data have you got? Continuing the height/weight example, if you have data on men from 5'1" to 6'5" I'd be more likely to trust extrapolation than if you only had it on men from 6'3" to 6'5"?

What is the nature of the relationship? Polynomial relationships can go way off very quickly.

etc.

Peter Flom
  • 94,055
  • 35
  • 143
  • 276