1

I have a model that I've used MSE as the accuracy metric. After reading this article on calculating prediction intervals using MSE I looked around for other resources.

I then found this PDF from Wharton that suggests y_hat +/- 2 * RMSE forms a prediction interval for y. However it doesn't tell me what percent prediction interval this is. Is this the 95% prediction interval for y_hat?

As a result I'm a little confused on how to derive the right formula for a N% prediction interval given the RMSE or MSE of the model. Could anyone point me to the right resources, or show me how to derive the formula for this, so I can actually understand how they've arrived at what seems like two different formulas?

  • That reference isn't quite right. It is making some (unstated) approximations, presumably to address an audience that might be confused by a full and accurate account of a prediction interval. Correct formulas are provided in several threads here, such as http://stats.stackexchange.com/questions/9131. They are needed in any situation where the fitted curve has appreciable uncertainty. – whuber Nov 25 '16 at 20:37
  • @whuber would you be willing to outline those unstated approximations in a comment or answer for completeness sake? –  Nov 25 '16 at 20:46
  • I already have: the approximation completely ignores uncertainty in the fitted curve. – whuber Nov 25 '16 at 20:48

1 Answers1

2

To your first question: Wharton does give the ~95% confidence interval. You know this because the formula requires a quantile to be multiplied by a standard error. In this case the quantile represented is 2, which corresponds roughly to ( 1.96 ) a 95% C.I.

In order to calculate N% prediction intervals using this formula you need to adjust the quantile variable accordingly. For example to calcuate a 90% confidence interval change the 2 in the Wharton equation to 1.645.

grldsndrs
  • 454
  • 3
  • 11
  • Ah I see, this assumes a normal distribution correct? If I wanted a t distribution I would have to adjust this I assume. –  Nov 25 '16 at 20:36
  • 1
    @grldsndrs You mean "if the d.f. is *infinite* then it is the same as the normal distribution", not 1. – Chris Haug Nov 25 '16 at 21:04
  • @ChrisHaug Yes you are quite right. I should have written infinity. – grldsndrs Nov 25 '16 at 21:36
  • Yes according to the number of degrees of freedom, but if the d.f. is infinite then it is the same as the normal distribution. – grldsndrs Nov 25 '16 at 21:38
  • What is your sample size? The degrees of freedom is not infinite but rather is determined by your sample size. If the sample size is less than 20 the normal may not provide a good approximation. But if it is greater than 100 it is probably good enough. – Michael R. Chernick Nov 25 '16 at 23:52
  • Shouldnt the default general interval be a t-interval since we are using an estimator for the standard deviation? – MSIS Apr 11 '20 at 05:23