I'm trying to match the prediction interval of the predict.lm() function in R using the formula found in this discussion :
Obtaining a formula for prediction limits in a linear model
I'm using a student's quantile in my interval but in the end it's far larger from the one given by predict().
Is there any specific calculation in the predict function, I tried to look at the code but couldn't find any answer. The formula looks ok as I found exactly the same from others source.
My R code :
airquality_clean <- na.omit(airquality)
attach(airquality_clean)
#Model estimation
model_1 <- lm(Ozone ~., data = airquality_clean)
#Unbias variance of the residuals
sigma_2 <- sum(model_1$residuals**2)/(dim(airquality_clean)[1]-dim(airquality_clean)[2])
#New observation
new <- data.frame(Solar.R=200,Wind=10,Temp=70,Day=1,Month=3)
#Calculated prediction interval
sigma <- sqrt(sigma_2*(1 + as.matrix(new)%*%solve(as.matrix(t(airquality_clean[,-1]))%*%as.matrix(airquality_clean[,-1]))%*%as.matrix(t(new))))
qt <- qt(0.995, df = dim(airquality_clean)[1]-dim(airquality_clean)[2])
int_pred_t <- cbind(predict(model_1, new)-(qt*sigma),predict(model_1, new)+(qt*sigma))
int_pred_t
[,1] [,2]
[1,] -22.59931 95.82563
#R prediction interval
predict(model_1, new, interval="predict", level=0.99)}
fit lwr upr
1 36.61316 -21.12916 94.35548
I'm not too far but it's not the same results. If I use a p value from a normal distribution and not a student I'm even closer.
Thank you.