R crossvalidation cv.glm: prediction error and confidence interval

Question

I am using the R package boot and the cv.glm function. The output 'delta' gives me the un-adjusted and adjusted prediction error. Here is an example on the top of page 10: http://www.unt.edu/rss/class/Jon/Benchmarks/CrossValidation1_JDS_May2011.pdf

My question is are 'prediction errors' similar to standard errors? To make a 95%, confidence interval, do I simply multiply this error by 1.96 (then +/- that result)?

Thanks

Prediction error refers to the discrepancy or difference between a predicted value (based on a model) and the actual value. — Ellis Valentiner, Jan 03 '14 at 20:10

score 4 · Answer 1 · answered Jan 03 '14 at 20:22

Prediction errors are different from standard errors in two critical ways.

Prediction errors provide intervals for predicted values, i.e. values which could be observed in the outcome controlling for some or all of the variation (through conditioning) in the predictors. Standard errors provide intervals for estimated statistics, e.g. parameters which are never truly observed. Continuously valued parameters such as log odds ratios in a logistic regression model can create "prediction intervals" for binary outcomes in the form of a confusion matrix (this is natural for Bayesians).
Prediction errors do not vanish in large $n$ whereas confidence intervals do. This is because no amount of sampling will reduce the variability inherent in a single observation drawn from the data generating mechanism. Prediction errors do decrease in large $n$ however, since the precision of the estimated predictive model improves. Confidence intervals do vanish in large $n$ as a result of the central limit theorem (usu.). This is because sampling the universe repeatedly would yield the exact same thing with 0 variation.

Since most predictive models are generated from parametric models, the calculation of both confidence intervals and prediction intervals usually requires some application of the $\delta$-method and the variance-covariance matrix from the parameter estimates. So prediction intervals and confidence intervals from a GLM are not independent.

It sounds like you are saying predictions can have huge confidence intervals. So... I get some number as my prediction error. What can I say about the quality of the prediction? What should I say to the manager asking 'how good is this prediction?' Can I say anything about the probability it will be in some interval? — John, Jan 04 '14 at 01:37
Huge CIs are relative, but even on a SE scale, there can be large CIs associated with particularly unstable parameter estimates. Likewise, prediction errors are relative. You need to impose some additional assumptions if you want to use prediction errors to determine the probability that a particular new observation will fall in some interval. In particular, you need to actually assume that the data are normal, or else use Chebyshev's rule or some other weaker, non-parametric relationship between interval probabilities and approximations of the SDs of the sampling dist'n of a pred. — AdamO, Jan 04 '14 at 05:10

R crossvalidation cv.glm: prediction error and confidence interval

1 Answers1

Linked