1

This is my first post so please bear with me. I do not have statistics background and I am still learning my way around so your answers will be very helpful. I am using GEE in R to compare the levels of a categorical predictor (Gp) with 2 levels on a binary response (Resp). The data comes from repeated measurements from a set of people (ID). I am interested in obtaining the odds ratio to compare these 2 levels of the group. This is for my work so I have created a dataset that can reproduce the error below:

library(gee)
set.seed(50)

# Creating a dataset
df <- data.frame("Resp" = rep(0,20), "Gp" = rep(c("a","b"),20))
df <- df[order(df$Gp),]

df[1:10,"ID"] <- "P1"
df[11:20,"ID"] <- "P2"
df[21:30,"ID"] <- "P3"
df[31:40,"ID"] <- "P4"

df[c("Gp","ID")] <- lapply(df[c("Gp","ID")],as.factor)

# Creating first dataframe with all responses from group a as 0

df1 <- copy(df)
df1[25:35,1] <- 1

table(df1$Gp,df1$Resp)
     0  1
  a 20  0
  b  9 11

All responses from Group a are 0. Now if I try to run the GEE, it ends with an error:

a <- gee(Resp ~ Gp, id= ID , data= df1, family=binomial(link=logit),na.action=na.omit,corstr = "exchangeable")

Beginning Cgee S-function, @(#) geeformula.q 4.13 98/01/27
running glm to get initial regression estimate
(Intercept)         Gpb 
  -19.56607    19.76674 
Error in gee(Resp ~ Gp, id = ID, data = df1, family = binomial(link = 
logit),  : Cgee: error: logistic model for probability has fitted value very 
close to 1. estimates diverging; iteration terminated.

However, if I edit the same dataset so that some of the responses of level a are 1, the model runs without any issues:

df2 <- copy(df)    
df2[17:32,1] <- 1

table(df2$Gp,df2$Resp)

     0  1
  a 16  4
  b  8 12

b <- gee(Resp ~ Gp, id= ID , data= df2, 
family=binomial(link=logit),na.action=na.omit,corstr = "exchangeable")

Beginning Cgee S-function, @(#) geeformula.q 4.13 98/01/27
running glm to get initial regression estimate
(Intercept)      Gpb 
-1.386294   1.791759

Am I right in assuming that this error is occurring because all responses are 0 for group A? Is it not possible to compare the levels of a group in this scenario since we cannot use chi sq or fisher test either?

Aovial
  • 11
  • 2

1 Answers1

2

Yes, indeed this is the separation problem that you have in logistic regression. You may find more information in this link.

If your motivation to use a GEE is to get coefficients with a marginal interpretation, and you also want to account for the separation problem, then you could use the GLMMadaptive package. With regard to the marginal coefficients have a look here, and with regard to the separation problem GLMMadaptive implements a penalized likelihood approach; an example can be found here.

Dimitris Rizopoulos
  • 17,519
  • 2
  • 16
  • 37
  • 1
    (+1) I can confirm that GLMMadaptive can solve this and several other problems commonly encountered with clustered/hierarchical/longitudinal data. – Robert Long Dec 12 '19 at 18:46