1

I have a data set that has 8 treatments, along with success data. I used a binomial glm to analyze the data, but generated some unexpected coefficient values for a few of the treatments and not sure what to do about it.

Here is the data and summary:

treatment = as.factor(c("A", "A", "A", "A", "B", "B", "B", "B", "C", "C", "C", "C", 
"D", "D", "D", "D", "E", "E", "E", "E", "F", "F", "F", "F", "G", 
"G", "G", "G", "H", "H", "H", "H"))
rep = c(1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4, 
1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4)
success = c(1, 1, 1, 2, 14, 17, 15, 18, 0, 0, 0, 0, 18, 18, 17, 18, 4, 
4, 2, 4, 2, 4, 1, 1, 1, 0, 0, 1, 8, 6, 6, 2)
total = c(20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 
20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 
20)

data = data.frame(treatment,rep,success,total)
data$perc = data$success/data$total

library(tidyverse)
data %>% group_by(treatment) %>% summarize(mean = mean(perc))

We can see that the mean for treatments B and D are .8 and .88 respectively.

Now perform a glm:

model.glm = glm(cbind(success,total) ~ treatment-1,data = data,family="binomial")

logit2prob <- function(logit){
  odds <- exp(logit)
  prob <- odds / (1 + odds)
  return(prob)
}

SuccessProb = logit2prob(coef(model.glm))
SuccessProb = round(logit2prob(coef(model.glm)),2)
SuccessProb

We can see that the estimates, using a glm, for B and D are .44 and .47, respectively. These are not close to the summary estimates.

If we use an anova, results are better.

model.aov = aov(perc ~ treatment-1,data=data)
SuccessProb.aov = coef(model.aov)
SuccessProb.aov

Here, the estimates for B and D are .8 and .89. Much better than the glm.

Does anyone know what I am doing wrong here?

gung - Reinstate Monica
  • 132,789
  • 81
  • 357
  • 650
user44796
  • 429
  • 1
  • 3
  • 10

1 Answers1

1

You may be confusing how R's glm() works with how SAS's PROC LOGISTIC works. In R, when you have binomial data with $>1$ Bernoulli trial, you need to use the odds (events to non-events), not the probability (events to total). It is also possible to use the events/total with the total as a weights argument. Consider:

agg.mean  = aggregate(perc                   ~treatment,   data, mean)
model.glm = glm(cbind(success,total)         ~treatment-1, data, family="binomial")
mod.glm.o = glm(cbind(success,total-success) ~treatment-1, data, family="binomial")
mod.glm.w = glm(      success/total          ~treatment-1, data, family="binomial",
                                                           weights=data$total)
output     = data.frame(agg.mean, 
                        round(logit2prob(coef(mod.glm.o)),4),
                        round(logit2prob(coef(mod.glm.w)),4),
                        round(logit2prob(coef(model.glm)),4) )
names(output)[3:5] = c("odds type", "weighted % type", "prob type")
output
#            treatment   perc odds type weighted % type prob type
# treatmentA         A 0.0625    0.0625          0.0625    0.0588
# treatmentB         B 0.8000    0.8000          0.8000    0.4444
# treatmentC         C 0.0000    0.0000          0.0000    0.0000
# treatmentD         D 0.8875    0.8875          0.8875    0.4702
# treatmentE         E 0.1750    0.1750          0.1750    0.1489
# treatmentF         F 0.1000    0.1000          0.1000    0.0909
# treatmentG         G 0.0250    0.0250          0.0250    0.0244
# treatmentH         H 0.2750    0.2750          0.2750    0.2157

It may help you to read my answer here: Difference in output between SAS's proc genmod and R's glm.

gung - Reinstate Monica
  • 132,789
  • 81
  • 357
  • 650