2

I want to be able to calculate the confidence interval from the estimated coefficient and respective standard errors.

I have a linear regression model which can be summarized (in R):

summary(fit1)

Call:
lm(formula = bwt ~ height + weight + parity, data = data)

Residuals:
    Min      1Q  Median      3Q     Max 
-66.913 -10.624   0.991  10.979  55.621 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) 31.19217   13.56879   2.299   0.0217 *  
height       1.24964    0.23083   5.414 7.48e-08 ***
weight       0.06781    0.02823   2.402   0.0164 *  
parity1     -1.83309    1.19838  -1.530   0.1264    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 17.9 on 1170 degrees of freedom
Multiple R-squared:  0.04898,   Adjusted R-squared:  0.04654 
F-statistic: 20.08 on 3 and 1170 DF,  p-value: 1.071e-12

With this model I can calculate the respective confidence intervals:

> confint(fit1)
                  2.5 %     97.5 %
(Intercept)  4.57029503 57.8140351
height       0.79676227  1.7025207
weight       0.01243198  0.1231932
parity1     -4.18429933  0.5181151

I would expect the intervals of the predictor height to be given by

$$ 1.24964 \pm (1.96*0.23083) = [0.7972132,1.702067] $$

where 1.24964 is the estimated value for the coefficient and 0.23083 is the standard error for this coefficient. The numbers are close but not quite the same.

What am I doing wrong?

JC1
  • 217
  • 1
  • 8
  • You are using 1.96, which is an approximation to the normal distribution. But recall that since we don't know the true variance of our error terms, we must use a students T distribution. This is almost (but not quite) the same; the differences vanish as sample size grows. – Andreas Feb 06 '16 at 01:33
  • Right. Check this out: `fit = lm(mpg ~ wt, mtcars)`; `coef=summary(fit)$coefficients[2,1]`; `err=summary(fit)$coefficients[2,2] `;`coef + c(-1,1)*err * qt(0.975, 30)`; `confint(fit, 'wt', level=0.95)` – Antoni Parellada Feb 06 '16 at 01:43
  • 1
    Thanks Andreas. You are right. Using $$qt(p=0.975, df=1170)$$ instead of 1.96 works. – JC1 Feb 06 '16 at 01:45
  • Pretty much a duplicate, for example of [this question](http://stats.stackexchange.com/questions/29981/should-confidence-intervals-for-linear-regression-coefficients-be-based-on-the-n). Numerous other relevant answers can be found. You might find it helpful to read about [pivotal quantities](https://en.wikipedia.org/wiki/Pivotal_quantity). A number of posts here discuss obtaining confidence intervals from pivotal quantities. – Glen_b Feb 06 '16 at 02:28

0 Answers0