This question is somehow related to Is the residual, e, an estimator of the error, $\epsilon$?
I also found some information here: Confidence interval of RMSE
Let's say, I got a model that explains sampled X by mean(X).
Using suggestions from Mr. @whuber I reproduced calculations of the PI from predict.lm
in R.
## lm PI
x_dat <- data.frame(x = rnorm(100, 0, 1))
lm_model <- lm(data = x_dat, x ~ 1)
summary(lm_model)
lm_preds <- predict.lm(lm_model
, x_dat
, interval = "prediction"
, level = 0.95
)
beta.hat <- lm_model$coefficients
# var-covariance matrix
V <- vcov(lm_model)
# mean prediction
Xp <- model.matrix(~ 1, x_dat)
pred <- as.numeric(Xp %*% beta.hat)[1]
# mean prediction variance:
pred_var <- unname(rowSums((Xp %*% V) * Xp))[1]
# confidence
t_stat <- qt((1 - 0.95)/2, df = lm_model$df.residual)
# residual MSE
res_mse <- sum(lm_model$residuals ^ 2) / lm_model$df.residual
# PI
PI <- pred + c(t_stat, -t_stat) * sqrt(pred_var + res_mse)
print(PI)
print(lm_preds[1, ])
> print(PI)
[1] -2.043175 2.572508
>
> print(lm_preds[1, ])
fit lwr upr
0.2646666 -2.0431747 2.5725079
I only have 2 questions.
Is it right that we assume that the true model error variance is that of sampled residual variance in order to make an unbiased estimate?
Given that we don't know what the true error variance is, does it mean we make a biased estimate of PI, in particular, by not adjusting for an additional dispersion of the residual variance? If so, can we assume that variance of the residual variance follows chi-square distribution in order to get an upper quantile of the value that can be supplied inside for an exact calculation?
predict.lm(...,
pred.var =
)