1

Suppose I have some reasonably sized dataset and I do linear regression on it, so now I have a model, say $\hat{y}=Ax+b$, where $y$ is real (or perhaps a vector, but let's say real for now), $x$ (my data points) are vectors, and $A$ and $b$ are the parameters of my linear fit that I've found through whatever method.

Now I have a new datapoint (say $x_0$) and I run it through my model, which has a true "output" value of $y_0$ but a predicted value of $\hat{y}_0$. The question is: how precise do I expect this prediction to be? Better yet, can I construct a $95\%$ confidence interval (or whatever percentage) for $y_0$?

I'd be happy to have a link to any sources on this topic; I assume it's complicated enough that it's not a one-line answer. It hasn't come up in the courses I've taken so far, so I wonder if it's even been done (it seems like it would have a lot of requirements, but one might hope the central limit theorem could help reduce the number of required hypotheses).

EDIT: I have, through the magic of tags, learned the word "prediction interval" and suspect it may be what I want. But I do not know anything about them, or how to compute them, or what we assume about the data to make them meaningful.

Richard Rast
  • 325
  • 1
  • 2
  • 10
  • See also [What is the difference between estimation and prediction?](http://stats.stackexchange.com/q/17773/17230) for some background info. & [Linear regression prediction interval](http://stats.stackexchange.com/q/33433/17230) for another, perhaps more intuitive, explanation of prediction intervals in linear regression. – Scortchi - Reinstate Monica Mar 15 '16 at 11:58
  • Thanks for this comment. Since I don't want to have another question closed, for what other kinds of predictors (more advanced than linear regression, say) can we form confidence intervals? And how much do we *really* need the normally distributed noise term for this prediction interval to make sense? (or should I edit this into the question?) – Richard Rast Mar 15 '16 at 12:01
  • You could've edited this q. & it would've entered the queue for re-opening; but I see you've asked a new one relating to the 2nd q. in your comment. The first q. in your comment touches on some rather neglected (IMO) issues - see [What non-Bayesian methods are there for predictive inference?](http://stats.stackexchange.com/q/169623/17230) & also some refs [here](http://meta.stats.stackexchange.com/questions/2679/why-was-a-question-about-predictive-distribution-closed). – Scortchi - Reinstate Monica Mar 15 '16 at 12:23

0 Answers0