0

I'm reading Linear Models with R, 2nd Ed., by Julian Faraway, and he says something puzzling in Chapter 4 on page 51:

There are two kinds of predictions made from regression models. One is a predicted mean response and the other is a prediction of a future observation. To make the distinction clear, suppose we have built a regression model that predicts the rental price of houses in a given area based on predictors such as the number of bedrooms and closeness to a major highway. There are two kinds of predictions that can be made for a given $x_0$:

  1. Suppose a specific house comes on the market with characteristics $x_0.$ Its rental price will be $x_0^T\beta+\varepsilon.$ Since $E\varepsilon=0,$ the predicted price is $x_0^T\hat\beta,$ but in assessing the variance of this prediction, we must include the variance of $\varepsilon.$
  2. Suppose we ask the question - "What would a house with characteristics $x_0$ rent for on average?" This selling price is $x_0^T\beta$ and is again predicted by $x_0^T\hat\beta$ but now only the variance in $\hat\beta$ needs to be taken into account.

Note: I think it only fair to ignore the distinction between selling and renting. That distinction is not the author's main point, here, clearly.

Another note: As this is not a homework problem, I am going to leave off the self-study tag.

Final note: this question says nothing about "forecasting", so the questions on Stats.SE concerning forecasting are not relevant.

My Question: Why can we ignore the variance of $\varepsilon$ in the prediction-of-mean problem (2.), but not the prediction-of-future-observation problem (1.)? I'm also not entirely sure I understand the difference between these two predictions. If you could please clarify that difference, I would be most grateful.

[EDIT] Apparently, predicting a future value corresponds to a prediction interval, while predicting a mean value corresponds to a confidence interval.

Adrian Keister
  • 3,664
  • 5
  • 18
  • 35
  • The duplicate is one of the top hits on a [site search about confidence and prediction intervals](https://stats.stackexchange.com/search?q=prediction+confidence+interval*+score%3A5). Some of the other hits look informative, too. – whuber Jan 28 '22 at 21:33
  • I don't understand why my question was closed: the post said nothing about confidence intervals. Is that really what the question is about, and I don't know it? – Adrian Keister Jan 28 '22 at 21:34
  • "What would a house with characteristics x0 rent for on average?" This selling price is xT0β and is again predicted by xT0β^ but now only the variance in β^ needs to be taken into account" implicitly describes a confidence interval for the selling price. Regardless, your quotations are clearly trying to distinguish confidence intervals from prediction intervals and everything written about that subject here helps answer your question. – whuber Jan 28 '22 at 21:36
  • Huh. This is starting to get annoying. I already knew the difference between CIs and PIs! This author is British - I wonder if they just use different terminology across the pond. Wouldn't be the first time. – Adrian Keister Jan 28 '22 at 21:37
  • 1
    Their terminology looks standard. "Predicted mean response" refers to a *property* of the model and "prediction of a future observation" refers to a *random variable.* Therein is the crux of the distinction between confidence intervals and prediction intervals, respectively. – whuber Jan 28 '22 at 21:41

0 Answers0