I have a data set that has 8 treatments, along with success data. I used a binomial glm to analyze the data, but generated some unexpected coefficient values for a few of the treatments and not sure what to do about it.
Here is the data and summary:
treatment = as.factor(c("A", "A", "A", "A", "B", "B", "B", "B", "C", "C", "C", "C",
"D", "D", "D", "D", "E", "E", "E", "E", "F", "F", "F", "F", "G",
"G", "G", "G", "H", "H", "H", "H"))
rep = c(1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4,
1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4)
success = c(1, 1, 1, 2, 14, 17, 15, 18, 0, 0, 0, 0, 18, 18, 17, 18, 4,
4, 2, 4, 2, 4, 1, 1, 1, 0, 0, 1, 8, 6, 6, 2)
total = c(20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20,
20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20,
20)
data = data.frame(treatment,rep,success,total)
data$perc = data$success/data$total
library(tidyverse)
data %>% group_by(treatment) %>% summarize(mean = mean(perc))
We can see that the mean for treatments B and D are .8 and .88 respectively.
Now perform a glm:
model.glm = glm(cbind(success,total) ~ treatment-1,data = data,family="binomial")
logit2prob <- function(logit){
odds <- exp(logit)
prob <- odds / (1 + odds)
return(prob)
}
SuccessProb = logit2prob(coef(model.glm))
SuccessProb = round(logit2prob(coef(model.glm)),2)
SuccessProb
We can see that the estimates, using a glm, for B and D are .44 and .47, respectively. These are not close to the summary estimates.
If we use an anova, results are better.
model.aov = aov(perc ~ treatment-1,data=data)
SuccessProb.aov = coef(model.aov)
SuccessProb.aov
Here, the estimates for B and D are .8 and .89. Much better than the glm.
Does anyone know what I am doing wrong here?