Why do the standard errors of logistic regression blow up with many zeros in response?

Question

I am working on a fixed-effects model where the response is the number of successes in $N$ trials. I noticed that when the number of successes is $0$ for a large number of units the standard error blows up. Why is that happening and will it affect the estimates of other coefficients?

Reproducible R code below.

set.seed(123456)

units <- c(1:10)
time <- c(1, 2)

# Create design matrix
X <- expand.grid(units = units, time = time)

# Create response
trials <- trunc(runif(nrow(X), 100, 1000))
successes <- rep(0, length(trials))

non_zeros <- 3
non_zeros_index <- sample(1:length(successes), non_zeros)

successes[sample(1:length(successes), non_zeros)] <- 
    trunc(runif(non_zeros, 2, 10))

y <- as.matrix(data.frame(successes = successes, 
               failures = trials - successes))

# Fit model
m <- glm(y ~ factor(units), data = X, family = binomial())

summary(m)

score 3 · Accepted Answer · edited Dec 10 '21 at 22:51

You are mostly fitting machine imprecision with this model. Here are the data, based on the provided code:

    cbind(y,units)
    #      successes failures units
    # [1,]         0      818     1
    # [2,]         7      771     2
    # [3,]         0      452     3
    # [4,]         0      407     4
    # [5,]         0      425     5
    # [6,]         0      278     6
    # [7,]         0      581     7
    # [8,]         0      186     8
    # [9,]         0      989     9
    #[10,]         0      250    10
    #[11,]         0      818     1
    #[12,]         0      634     2
    #[13,]         0      914     3
    #[14,]         9      883     4
    #[15,]         9      985     5
    #[16,]         0      906     6
    #[17,]         0      890     7
    #[18,]         0      277     8
    #[19,]         0      401     9
    #[20,]         0      799    10

There are no successes associated with the reference level units = 1, whose log odds provide the intercept. That intercept, from summary(m) is:

    Coefficients:
                      Estimate Std. Error z value Pr(>|z|)
    (Intercept)       -26.4016  8109.4765  -0.003    0.997

for odds (and, effectively, success probability) of

    exp(-26.4016)
    # [1] 3.41925e-12

although you know that the actual success probability for that factor level is 0. The other coefficients represent differences from the log odds of the intercept, so it's not surprising that they have large standard errors, even for the 3 levels of units that actually have successes.

Things make a bit more sense if you choose a factor level with some successes as reference:

    m2 <- glm(y ~ relevel(factor(units),ref="2"), data = X, 
                   family = binomial())

and look at the first few rows of coefficients from summary(m2):

    Coefficients:
                                         Estimate Std. Error z value Pr(>|z|)    
    (Intercept)                           -5.3019     0.3789 -13.993   <2e-16 ***
    relevel(factor(units), ref = "2")1   -21.0997  8109.4765  -0.003    0.998    
    relevel(factor(units), ref = "2")3   -20.8594  7870.1298  -0.003    0.998    
    relevel(factor(units), ref = "2")4     0.3367     0.5054   0.666    0.505    
    relevel(factor(units), ref = "2")5     0.2478     0.5054   0.490    0.624    
    relevel(factor(units), ref = "2")6   -20.6136  7475.7834  -0.003    0.998

Here you have reasonable values and acceptable standard errors for the intercept and for the coefficients of levels of units that actually have successes.

The large standard errors of the coefficient estimates are related to division by something close to 0, with near-perfect separation. Following Agresti's description in Categorical Data Analysis, 2nd Edition, Chapter 5, with $N$ sets of observations having distinct combinations of $p$ predictors, let $X$ be the $N\times p$ matrix of predictor values, $n_i$ the number of cases for the $i^{th}$ set of distinct combinations of predictor values, $\hat\pi_i$ the estimated success probability for predictor-value set $i$, and $\hat\beta$ the vector of coefficient estimates.

Quoting from Agresti, page 194:

The estimated covariance matrix ... has form

$$\widehat {\text{cov}}(\hat \beta) = \left[X'\text{diag}[n_i\hat \pi_i(1-\hat\pi_i)]X\right]^{-1}$$

where $\text{diag}[n_i\hat\pi_i(1-\hat\pi_i)]$ denotes the $N\times N$ diagonal matrix having $[n_i \hat \pi_i(1-\hat \pi_i)]$ on the main diagonal.

The individual coefficient standard errors are the square roots of the diagonal elements of that covariance matrix. If there are no successes for a particular predictor-value set $i$, then the corresponding element of $\text{diag}[n_i\hat\pi_i(1-\hat\pi_i)]$ will be 0 (or very close to it, as in the OP). The information matrix $X'\text{diag}[n_i\hat \pi_i(1-\hat\pi_i)]X$ could be ill-conditioned, and its inverse, the estimated coefficient covariance matrix, will then show this type of problem.

Why do the standard errors of logistic regression blow up with many zeros in response?

1 Answers1