I have some over-dispersed data and am trying to decide which model would best suit the data. The data are usually counts of symptoms or number of correct items on some cognitive tasks. As an example:
set.seed(69)
g1<-rnorm(700,30,9); g2<-rnorm(100,25,7); g3<-rnorm(100,20,5)
gt<-data.frame(score=c(g1, g2, g3), fac1=factor(rep(c("a", "b", "c"), c(700, 100, 100))), fac2=ordered(rep(c(0,1,2), c(3,13,4))))
gt$score<-with(gt, ifelse(fac2 == 0, score, score-rnorm(1, 0.5, 2)))
gt$score<-with(gt, ifelse(fac2 == 2, score-rnorm(1, 0.5, 2), score))
gt$score<-round(with(gt, ifelse(score>=30, 30, score)))
gt$cov1<-with(gt, score + rnorm(900, sd=40))/40
gt$score.30<-with(gt, 30-score)
The models I'm thinking about using are:
glmnb1<-glm.nb(score.30~cov1 + fac1*fac2, data=gt)
hur1<-hurdle(score.30~cov1 + fac1*fac2, dist="negbin", data=gt)
quasi1<-glm(cbind(score, score.30)~cov1+fac1*fac2, family="quasibinomial", data=gt)
- How to decide between the negative binomial and the quasibinomial?
- In this example, the hurdle model is a better fit compared to the negative binomial. However, if the quasibinomial was better compared to the negative binomial (hypothetically or otherwise), how do you compare the hurdle and quasibinomial? Is there a hurdle quasibinomial?