35

When you predict a fitted value from a logistic regression model, how are standard errors computed? I mean for the fitted values, not for the coefficients (which involves Fishers information matrix).

I only found out how to get the numbers with R (e.g., here on r-help, or here on Stack Overflow), but I cannot find the formula.

pred <- predict(y.glm, newdata= something, se.fit=TRUE)

If you could provide online source (preferably on a university website), that would be fantastic.

user2457873
  • 353
  • 1
  • 4
  • 4

1 Answers1

43

The prediction is just a linear combination of the estimated coefficients. The coefficients are asymptotically normal so a linear combination of those coefficients will be asymptotically normal as well. So if we can obtain the covariance matrix for the parameter estimates we can obtain the standard error for a linear combination of those estimates easily. If I denote the covariance matrix as $\Sigma$ and and write the coefficients for my linear combination in a vector as $C$ then the standard error is just $\sqrt{C' \Sigma C}$

# Making fake data and fitting the model and getting a prediction
set.seed(500)
dat <- data.frame(x = runif(20), y = rbinom(20, 1, .5))
o <- glm(y ~ x, data = dat)
pred <- predict(o, newdata = data.frame(x=1.5), se.fit = TRUE)

# To obtain a prediction for x=1.5 I'm really
# asking for yhat = b0 + 1.5*b1 so my
# C = c(1, 1.5)
# and vcov applied to the glm object gives me
# the covariance matrix for the estimates
C <- c(1, 1.5)
std.er <- sqrt(t(C) %*% vcov(o) %*% C)

> pred$se.fit
[1] 0.4246289
> std.er
          [,1]
[1,] 0.4246289

We see that the 'by hand' method I show gives the same standard error as reported via predict

Dason
  • 2,092
  • 20
  • 19
  • 2
    I have one related question. When we predict a value and confidence interval on a linear regression (not logistic), we incorporate the error variance/standard error. But the logistic regression doesn't. Does this difference come from the fact that the logistic regression's observed values are either 0 or 1 and that there's no point in estimating error variance? I feel like we should at least do something, but I may be missing something. – user2457873 Aug 10 '13 at 18:33
  • 3
    Old question, but this thread helped me just now, so here goes: The logit observes 0 or 1, but it predicts a probability. When you get a standard error of a fitted value, it is on the scale of the linear predictor. You get a confidence interval on the probability by talking logit(fit+/-1.96*se.fit) – generic_user Mar 07 '14 at 00:58
  • Just be aware that this uses the asymptotic normal approx, which can be quite bad for the logistic model (search this site for Hauss-Donner phenomenon). For the coefficients, that can be remedied by for instance likelihood profiling (used by confint function in MASS). That is not possible for the linear predictors ... – kjetil b halvorsen Nov 04 '16 at 18:50
  • 3
    This is incorrect for what the OP asked for; the GLM you fit uses the identity link function, not the logit link function. You should have fit `o – Zhe Zhang Oct 01 '17 at 17:54