You are mostly fitting machine imprecision with this model. Here are the data, based on the provided code:
cbind(y,units)
# successes failures units
# [1,] 0 818 1
# [2,] 7 771 2
# [3,] 0 452 3
# [4,] 0 407 4
# [5,] 0 425 5
# [6,] 0 278 6
# [7,] 0 581 7
# [8,] 0 186 8
# [9,] 0 989 9
#[10,] 0 250 10
#[11,] 0 818 1
#[12,] 0 634 2
#[13,] 0 914 3
#[14,] 9 883 4
#[15,] 9 985 5
#[16,] 0 906 6
#[17,] 0 890 7
#[18,] 0 277 8
#[19,] 0 401 9
#[20,] 0 799 10
There are no successes associated with the reference level units = 1
, whose log odds provide the intercept. That intercept, from summary(m)
is:
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -26.4016 8109.4765 -0.003 0.997
for odds (and, effectively, success probability) of
exp(-26.4016)
# [1] 3.41925e-12
although you know that the actual success probability for that factor level is 0. The other coefficients represent differences from the log odds of the intercept, so it's not surprising that they have large standard errors, even for the 3 levels of units
that actually have successes.
Things make a bit more sense if you choose a factor level with some successes as reference:
m2 <- glm(y ~ relevel(factor(units),ref="2"), data = X,
family = binomial())
and look at the first few rows of coefficients from summary(m2)
:
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -5.3019 0.3789 -13.993 <2e-16 ***
relevel(factor(units), ref = "2")1 -21.0997 8109.4765 -0.003 0.998
relevel(factor(units), ref = "2")3 -20.8594 7870.1298 -0.003 0.998
relevel(factor(units), ref = "2")4 0.3367 0.5054 0.666 0.505
relevel(factor(units), ref = "2")5 0.2478 0.5054 0.490 0.624
relevel(factor(units), ref = "2")6 -20.6136 7475.7834 -0.003 0.998
Here you have reasonable values and acceptable standard errors for the intercept and for the coefficients of levels of units
that actually have successes.
The large standard errors of the coefficient estimates are related to division by something close to 0, with near-perfect separation. Following Agresti's description in Categorical Data Analysis, 2nd Edition, Chapter 5, with $N$ sets of observations having distinct combinations of $p$ predictors, let $X$ be the $N\times p$ matrix of predictor values, $n_i$ the number of cases for the $i^{th}$ set of distinct combinations of predictor values, $\hat\pi_i$ the estimated success probability for predictor-value set $i$, and $\hat\beta$ the vector of coefficient estimates.
Quoting from Agresti, page 194:
The estimated covariance matrix ... has form
$$\widehat {\text{cov}}(\hat \beta) = \left[X'\text{diag}[n_i\hat \pi_i(1-\hat\pi_i)]X\right]^{-1}$$
where $\text{diag}[n_i\hat\pi_i(1-\hat\pi_i)]$ denotes the $N\times N$ diagonal matrix having $[n_i \hat \pi_i(1-\hat \pi_i)]$ on
the main diagonal.
The individual coefficient standard errors are the square roots of the diagonal elements of that covariance matrix. If there are no successes for a particular predictor-value set $i$, then the corresponding element of $\text{diag}[n_i\hat\pi_i(1-\hat\pi_i)]$ will be 0 (or very close to it, as in the OP). The information matrix $X'\text{diag}[n_i\hat \pi_i(1-\hat\pi_i)]X$ could be ill-conditioned, and its inverse, the estimated coefficient covariance matrix, will then show this type of problem.