What GLM family should I use?

Question

I have data on reproductive success of Drosophila throughout their whole lifetime. It is mainly proportional data I have, as each observation is of the type (#successes = WT in my case, #failures = SPA in my case) within a mating patch (vial in my case). I am trying to fit a model with two categorical fixed effects (treatment and exposure (to this treatment) and their interaction.

I have tried running the following model:


glm(cbind(WT,SPA) ~ treatment*exposure, family= "binomial", data= lrs2)

but it came out extremely over dispersed. I used the glm.binomial.disp() command ( from the dispmod R package ) to fix the overdispersion, which worked, but the outcome says Ho confirmed although the plot seems to say completely otherwise, suggesting something must be wrong with the model fit.

Here is how my data is distributed: RATIO correspond to #success/N (representation of cbind(WT,SPA) used higher up), whereas WT is the number of offsprings sired by the focal individual (#successes. [![data distribution][1]][1]

Seeing how the data is distributed, I thought I could use a a zero-inflated poisson, and use the following model:


zeroinfl(WT  ~  treatment*exposure, data= lrs2)

Would that be correct seeing how my data is distributed?

Thank you for you help, Im struggling quite hard. Quentin

What indicates you that it came out extremely overdispersed? I am not sure I do get it right, but the first histogram, RATIO, represents the ratio offspring sired by WT on the total offspring? Have you checked that there is not a complete separation by one of your variables, like one level has only 0s ? Are they really 0s or very low counts? You might want to use a Hurdle model, which is roughly equivalent to analyzing the proportion of successes and then the number of offspring sired within successful events. Maybe see [this thread](https://stats.stackexchange.com/questions/81457/). — CaroZ, Aug 21 '19 at 12:47
The summary of the binomial glm indicates : "Residual deviance: 19967 on 414 degrees of freedom", that's when I concluded the model was highly over dispersed. Yes RATIO is the proportion of offspring sired by the focal individual on the total number of offsprings produced by the female. I added a dotplot to my original post up there, and indeed some of the levels of exposure do have more 0s, but 0s are not only restricted to only one level as you suspected. I will check out the thread, thank you so much! — Quentin Corbel, Aug 21 '19 at 13:10
[tag:beta-regression] might also be an option for proportion data (R package betareg). (A transformation is needed if 0s or 1s are observed, see betareg JSS paper.) — hplieninger, Aug 21 '19 at 13:15
yes I have tried beta binomial regression models, until I realised it was excluding the 0s and 1s, which are very important in my study. I have actually skimmed through this betareg paper, but I did not spot how they dealt with 0s and 1s, I will take a closer look! Thanks :) — Quentin Corbel, Aug 21 '19 at 13:22

What GLM family should I use?

0 Answers0