After fitting a logistic regression model in R using model <- glm(y~x,family='binomial')
I can obtain the confidence intervals for the fitted coefficients using confint(model)
, but I want to know how to manually compute these values. In the case of a linear model lin_mod <- lm(y~x)
I can just do the following to obtain a 95% confidence interval for the slope coefficient:
CI_lower <- coefficients(lin_mod)[2] - 1.96*summary(lin_mod)$coefficients[2,2]
CI_upper <- coefficients(lin_mod)[2] + 1.96*summary(lin_mod)$coefficients[2,2]
Where coefficients(lin_mod)[2]
is the estimated value of the coefficient, and summary(lin_mod)$coefficients[2,2]
is corresponding standard error.
However when I use this same process to compute the confidence interval of the fitted coefficients of a logistic regression, the values don't agree with the output from confint
. Below is an example using some randomly generated data:
x <- rnorm(n=100, mean=5, sd=2)
y_prob <- plogis(x, location=5, scale=1)
y <- sapply(y_prob, function(p) rbinom(1, 1, p))
model <- glm(y~x, family='binomial')
summary(model)$coefficients
# Estimate Std. Error z value Pr(>|z|)
# (Intercept) -3.8998231 0.8838826 -4.412150 1.023490e-05
# x 0.7963213 0.1746632 4.559183 5.135303e-06
CI_lower <- coefficients(model)[2] - 1.96*summary(model)$coefficients[2,2] # = 0.4539815
CI_upper <- coefficients(model)[2] + 1.96*summary(model)$coefficients[2,2] # = 1.138661
confint(model)
# 2.5 % 97.5 %
# (Intercept) -5.8044657 -2.313925
# x 0.4843258 1.173998
As you can see, manually computing the 95% CI around the x-coefficient yielded (0.4539815,1.138661)
whereas computing it using confint
yielded (0.4843258,1.173998)
. So my question is, how is confint
computing this confidence interval, and why does my estimate differ? From some additional tests on larger samples I can see that the two estimates converge in the large-N limit, but I'm interested in what's going on for small N, in particular why the CI produced by confint
is not symmetric about the coefficient estimate.