0

My understanding is that generalized linear modeling (GLM) is recommended for proportion data.

However, this seems to run into problems when a set of data is full of zeros (or ones). For example, a data with two classes (A and B) with six data points each - all 0 alive and 10 dead for A and all 5 alive and 5 dead for B.

Using R to analyze this ...

alive <- c(0, 0, 0, 0, 0, 0, 5, 5, 5, 5, 5, 5)
dead <- c(10, 10, 10, 10, 10, 10, 5, 5, 5, 5, 5, 5)
type <- rep(LETTERS[1:2], each = 6)
model <- glm(cbind(alive, dead) ~ type, family = binomial)
summary (model)

Here's the summary

Call:
glm(formula = cbind(alive, dead) ~ type, family = binomial)

Deviance Residuals: 
       Min          1Q      Median          3Q         Max  
-9.528e-06  -9.528e-06  -4.764e-06   0.000e+00   0.000e+00  

Coefficients:
            Estimate Std. Error z value Pr(>|z|)
(Intercept)   -26.12   36754.57  -0.001    0.999
typeB          26.12   36754.57   0.001    0.999

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 5.1783e+01  on 11  degrees of freedom
Residual deviance: 5.4466e-10  on 10  degrees of freedom
AIC: 20.825

Number of Fisher Scoring iterations: 23

output is that A and B are not statistically different (p = 1).

I am pretty sure that the problem is too many zeros (or ones). What is a better way of analyzing this?

I have seen this, but I am unsure if it's appropriate (AND I couldn't figure out how to use stan_glm under the same data context). This is similar but was never answered.

Ho-Yon
  • 1
  • See: https://stats.stackexchange.com/questions/11109/how-to-deal-with-perfect-separation-in-logistic-regression – Björn Feb 14 '19 at 05:14
  • Got it. I didn't know the lingo "perfect separation of logistic regression" or "complete or quasi-complete separation of logistic regression." It looks like https://stats.stackexchange.com/questions/307205/logistic-regression-p-values-all-1-yet-model-fits-perfectly?noredirect=1&lq=1 is similar. And https://stats.idre.ucla.edu/other/mult-pkg/faq/general/faqwhat-is-complete-or-quasi-complete-separation-in-logisticprobit-regression-and-how-do-we-deal-with-them/ is another useful discussion. P-value is discouraged using glmnet https://stats.stackexchange.com/questions/77546/how-to-interpret-glmnet – Ho-Yon Feb 14 '19 at 23:39

0 Answers0