A categorical variable in glm shows significance from analysis of deviance, but each level is not significant in z-test

Question

I am fitting a generalized linear model (glm). The explanatory variable is categorical with three levels (control, treat1, treat2). The response variable is 0 or 1. The response rate for each treatment level is ploted as the figure below (from left to right: control, treat1, treat2):

enter image description here

There seems to be a big treatment effect between treat1 vs. control and treat2 vs. control. I applied glm:

fit <- glm(response ~ treatment, family = binomial, data = dat)

Coefficients:
             Estimate Std. Error z value Pr(>|z|)        
(Intercept)   -21.57    6536.57  -0.003    0.997
treat1        23.76    6536.57   0.004    0.997
treat2        43.13    9364.95   0.005    0.996

The z-test shows that neither treat1 nor treat2 is significant compared to the reference level control.

However, the analysis of deviance confirmed that the treatment factor as a whole is highly significant:

drop1(M2, test="Chisq")

response ~ treatment
            Df   Deviance    AIC    LRT  Pr(>Chi)    
 <none>          13.003    19.003                     
 treatment   2   77.936    79.936 64.932 7.946e-15 ***

How shall I interpret such a strange result? Why does the individual z-test not give me any significant result, while according to the plot there is obviously an effect between treat1 and control, and between treat2 and control?

Maximum-likelihood estimates cannot be calculated in case of quasi-complete separation (only 0 in control, almost only 1 in test groups). This is confirmed by the huge standard errors. — Michael M, Oct 15 '13 at 10:35
thanks @Michael Mayer. Indeed my data has quasi-complete separation problem. The estimated coefficients and the SE tend to be too large. Do you have any suggestions about how to deal with it? Present the result descriptively? — tiantianchen, Oct 15 '13 at 11:05
See [here](http://stats.stackexchange.com/questions/11109/how-to-deal-with-perfect-separation-in-logistic-regression/68917) for a discussion of ways to deal with separation. — Scortchi - Reinstate Monica, Oct 23 '13 at 09:44

score 3 · Accepted Answer · answered Oct 15 '13 at 14:38

You cannot use Wald's z-test when the maximum likelihood estimates are infinite (they just look finite in the model fit). However you can still use a likelihood-ratio test as you see with the deviance analysis. You just have to set up model comparisons that test the hypotheses that you are interested in.

For example, to test $\mu_c = \mu_{treat1}$, you fit a model under this assumption, and compare it to the full model.

fit2 <- glm(response ~ I(treatment=="treat2"), family = binomial, data = dat)
anova(fit1, fit2, test="Chisq")

A categorical variable in glm shows significance from analysis of deviance, but each level is not significant in z-test

1 Answers1

Linked