My understanding is that generalized linear modeling (GLM) is recommended for proportion data.
However, this seems to run into problems when a set of data is full of zeros (or ones). For example, a data with two classes (A and B) with six data points each - all 0 alive and 10 dead for A and all 5 alive and 5 dead for B.
Using R to analyze this ...
alive <- c(0, 0, 0, 0, 0, 0, 5, 5, 5, 5, 5, 5)
dead <- c(10, 10, 10, 10, 10, 10, 5, 5, 5, 5, 5, 5)
type <- rep(LETTERS[1:2], each = 6)
model <- glm(cbind(alive, dead) ~ type, family = binomial)
summary (model)
Here's the summary
Call:
glm(formula = cbind(alive, dead) ~ type, family = binomial)
Deviance Residuals:
Min 1Q Median 3Q Max
-9.528e-06 -9.528e-06 -4.764e-06 0.000e+00 0.000e+00
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -26.12 36754.57 -0.001 0.999
typeB 26.12 36754.57 0.001 0.999
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 5.1783e+01 on 11 degrees of freedom
Residual deviance: 5.4466e-10 on 10 degrees of freedom
AIC: 20.825
Number of Fisher Scoring iterations: 23
output is that A and B are not statistically different (p = 1).
I am pretty sure that the problem is too many zeros (or ones). What is a better way of analyzing this?
I have seen this, but I am unsure if it's appropriate (AND I couldn't figure out how to use stan_glm under the same data context). This is similar but was never answered.