Quasibinomial vs negative binomial and hurdles

Question

I have some over-dispersed data and am trying to decide which model would best suit the data. The data are usually counts of symptoms or number of correct items on some cognitive tasks. As an example:

set.seed(69)
g1<-rnorm(700,30,9); g2<-rnorm(100,25,7); g3<-rnorm(100,20,5)
gt<-data.frame(score=c(g1, g2, g3), fac1=factor(rep(c("a", "b", "c"), c(700, 100, 100))), fac2=ordered(rep(c(0,1,2), c(3,13,4))))
gt$score<-with(gt, ifelse(fac2 == 0, score, score-rnorm(1, 0.5, 2)))
gt$score<-with(gt, ifelse(fac2 == 2, score-rnorm(1, 0.5, 2), score))
gt$score<-round(with(gt, ifelse(score>=30, 30, score))) 
gt$cov1<-with(gt, score + rnorm(900, sd=40))/40
gt$score.30<-with(gt, 30-score)

The models I'm thinking about using are:

glmnb1<-glm.nb(score.30~cov1 + fac1*fac2, data=gt)    
hur1<-hurdle(score.30~cov1 + fac1*fac2, dist="negbin", data=gt)
quasi1<-glm(cbind(score, score.30)~cov1+fac1*fac2, family="quasibinomial", data=gt)

How to decide between the negative binomial and the quasibinomial?
In this example, the hurdle model is a better fit compared to the negative binomial. However, if the quasibinomial was better compared to the negative binomial (hypothetically or otherwise), how do you compare the hurdle and quasibinomial? Is there a hurdle quasibinomial?

score 7 · Accepted Answer · answered Nov 06 '11 at 22:45

While both "negative binomial" and "quasibinomial" have the word "binomial" in them, they are very different. The negative binomial is an actual parametric distribution, and it has an infinite range, so you should think of it as a generalization of the Poisson distribution, not the binomial distribution. If your data are counts of "successes" out of 30 (so there is an upper limit), then negative binomial is simply not appropriate whether it is augmented with a hurdle or not.

On the other hand, the "quasibinomial" model does not correspond to any actual distribution, so it is difficult to compare it to distribution-based models. If you want to get insight into the process generating the data, you might consider other parametric generalizations of the binomial distribution, such as the beta-binomial model (of which you could probably make a hurdle version too).

Thanks. The first part is important but is not emphasised in anything I've read so far. They were probably never expecting this kind of use, or I just read over it. I'll look into the other kinds of binomials and will probably update this question or put another question up. — Matt Albrecht, Nov 07 '11 at 08:01

Quasibinomial vs negative binomial and hurdles

1 Answers1

Linked