R predict with "prediction" option

Question

I'm trying to match the prediction interval of the predict.lm() function in R using the formula found in this discussion :

Obtaining a formula for prediction limits in a linear model

I'm using a student's quantile in my interval but in the end it's far larger from the one given by predict().

Is there any specific calculation in the predict function, I tried to look at the code but couldn't find any answer. The formula looks ok as I found exactly the same from others source.

My R code :

airquality_clean <- na.omit(airquality)
attach(airquality_clean)

#Model estimation
model_1 <- lm(Ozone ~., data = airquality_clean)

#Unbias variance of the residuals
sigma_2 <- sum(model_1$residuals**2)/(dim(airquality_clean)[1]-dim(airquality_clean)[2])

#New observation
new <- data.frame(Solar.R=200,Wind=10,Temp=70,Day=1,Month=3)

#Calculated prediction interval
sigma <- sqrt(sigma_2*(1 + as.matrix(new)%*%solve(as.matrix(t(airquality_clean[,-1]))%*%as.matrix(airquality_clean[,-1]))%*%as.matrix(t(new))))
qt <- qt(0.995, df = dim(airquality_clean)[1]-dim(airquality_clean)[2])
int_pred_t <- cbind(predict(model_1, new)-(qt*sigma),predict(model_1, new)+(qt*sigma))
int_pred_t
          [,1]     [,2]
[1,] -22.59931 95.82563

#R prediction interval
predict(model_1, new, interval="predict", level=0.99)}
       fit       lwr      upr
1 36.61316 -21.12916 94.35548

I'm not too far but it's not the same results. If I use a p value from a normal distribution and not a student I'm even closer.

Thank you.

Your question is not clear. Are you experiencing a confusion between a confidence interval and a prediction interval? — Glen_b, Nov 11 '13 at 01:06
Sorry if i'm not clear.I've tried to numerically calculate the same prediction interval provided by R but I can't match its results. I've used the formula from the discusssion posted in my previous post with the unbiased estimation of σ2 and a treshold from a student distribution.I don't know if there is a problem with the formula or with some numeric aproximations. — Alex, Nov 12 '13 at 10:26
It will be a lot easier to figure out what might be wrong if you explain exactly what you did for both parts of the comparison. — Glen_b, Nov 12 '13 at 10:40

Stat · Answer 1 · 2014-04-03T03:46:00.380

You didn't construct your new object correctly. You need to take into account the effect of the intercept term by adding a "1" (if you are fitting with an intercept) to your linear function if you are finding the C.I. manually yourself. Check out how I created vector a in the following code. But if you use the predict function to directly find the C.I., then the newdat argument does not need to have any "1" for the intercept. R will take care of that! Check out how I used a.dat below and and found identical results:

> #Model estimation
> lm.1<- lm(Ozone ~., data = airquality_clean)

> #Design Matrix
> x=model.matrix(lm.1)
> 
> #Defining linear function
> a=c(1,200,10,70,1,3)
> 
> #Defining new data.frame
> a.dat=data.frame(Solar.R=a[2],Wind=a[3],Temp=a[4],Month=a[5],Day=a[6])
> 
> #Finding the upper prediction interval
> predict(lm.1,newdat=a.dat)+qt(.995,summary(lm.1)$df[2])*summary(lm.1)$sigma*sqrt(1+t(a)%*%solve(t(x)%*%x)%*%a)
         [,1]
[1,] 103.2434
> 
> #Finding the lower prediction interval
> predict(lm.1,newdat=a.dat)-qt(.995,summary(lm.1)$df[2])*summary(lm.1)$sigma*sqrt(1+t(a)%*%solve(t(x)%*%x)%*%a)
          [,1]
[1,] -16.76174
> 
> 
> #Using predict function
> predict(model_1, a.dat, interval="predict", level=0.99)
       fit       lwr      upr
1 43.24083 -16.76174 103.2434
>

R predict with "prediction" option

1 Answers1