26

I am trying to model count data in R that is apparently underdispersed (Dispersion Parameter ~ .40). This is probably why a glm with family = poisson or a negative binomial (glm.nb) model are not significant. When I look at the descriptives of my data, I don't have the typical skew of count data and the residuals in my two experimental conditions are homogeneous, too.

So my questions are:

  1. Do I even have to use special regression analyses for my count data, if my count data doesn't really behave like count data? I face non-normality sometimes (usually due to the kurtosis), but I used the percentile bootstrap method for comparing trimmed means (Wilcox, 2012) to account for non-normality. Can methods for count data be substituted by any robust method suggested by Wilcox and realized in the WRS package?

  2. If I have to use regression analyses for count data, how do I account for the under-dispersion? The Poisson and the negative binomial distribution assume a higher dispersion, so that shouldn't be appropriate, right? I was thinking about applying the quasi-Poisson distribution, but that's usually recommended for over-dispersion. I read about beta-binomial models which seem to be able to account for over- as well as underdispersion are availabe in the VGAM package of R. The authors however seem to recommend a tilded Poisson distribution, but I can't find it in the package.

Can anyone recommend a procedure for underdispersed data and maybe provide some example R code for it?

gung - Reinstate Monica
  • 132,789
  • 81
  • 357
  • 650
Sil
  • 261
  • 1
  • 3
  • 3
  • 1
    How do you know your data is underdispersed? How are you calculating the dispersion parameter? – Hong Ooi Aug 14 '13 at 10:22
  • 1
    It would also help to tell us more about what you are interested in. For linear predictor point estimates and prediction of values, underdispersion rarely is a problem but tests and intervals may be unnecessarily conservative (quasi families would help with that). That said, for a "normal" likelihood approach check out the COM Poisson and other generalized Poisson models. – Momo Aug 14 '13 at 11:46
  • @ Hung Ooi:I tested the dispersion with dispersiontest(Poissonmodel, alternative = c("less")) and the test turned out significant. – Sil Aug 14 '13 at 11:47
  • 1
    @ Momo: I want to test if negotiating dyads in two experimental conditions differ in the correct offers they make. Correct offers mean that dyads claim more issues that correspond to their teams' respective interests instead of claiming issues more valuabe for the other party. First, I wasn´t even aware that this is count data. Do you mean the Conway-Maxwell-Poisson Distribution by COM Poisson? Thanks a lot already! – Sil Aug 14 '13 at 11:58
  • 3
    Thanks for the additional info. Yes, I meant the conway-maxwell poisson. Shmueli & co developed a kindbof generalized linear model for it, there also is an R package if you'd like to try. – Momo Aug 14 '13 at 13:17
  • I must admit I have difficulties understanding your substantive problem, but statistically, does it boil down to comparing the mean or, more generally, location of two groups? And you would average some count? How many distinct values do you have? How large is you sample? Perhaps a simple t-test or Wilcoxon test would suffice and you don't gave to go through all the trouble finding the right count data model. – Momo Aug 14 '13 at 13:28
  • n1 = 27, n2 = 29, yes it boils down to comparing two groups.I would be more than happy with a t-test but I thought I can´t ignore all the literature and procedures on count data.Moreover, I want to use the count DV as a mediator for an intercval scaled variable, and for this model I thought I`d definitely need an appropriate model.I could use a robust mediation instead, of course, if thats possible. – Sil Aug 14 '13 at 13:49
  • @Sil: I think I would try a t-test first and check residuals with an emphasis on non-constant variance. A good estimator for the expected value is exactly or roughly the sample mean in the normal, poisson, negbin and com Poisson case. The underdispersion might make the mean-variance dependency of count data negligible. For the count as a predictor for another variable (like in mediation analysis), no extra precautions are needed IMO - unless there are measurement errors in the variable. – Momo Aug 14 '13 at 23:11
  • dispersion parameter is 0.4? Did you get that using the negative binomial, or are you talking about the scale parameter? For a negative binomial the var(Y) = mean + D(mean^2). Which means that a dispersion parameter of 0.4 is still overdispersion. if your dispersion parameter was 0 it would be a normal Poisson model where E(Y) = var(Y) = mean. if less than 0 then you have underdispersion.Or perhaps i have the understanding of all this wrong. –  Jan 13 '15 at 11:01

4 Answers4

17

The best --- and standard ways to handle underdispersed Poisson data is by using a generalized Poisson, or perhaps a hurdle model. Three parameter count models can also be used for underdispersed data; eg Faddy-Smith, Waring, Famoye, Conway-Maxwell and other generalized count models. The only drawback with these is interpretability. But for general underdispersed data the generalized Poisson should be used. It is like negative binomial for overdispersed data. I discuss this in some detail in two of my books, Modeling Count Data (2014) and Negative Binomial Regression, 2nd edition, (2011) both by Cambridge University Press. In R the VGAM package allows for generalized Poisson (GP) regression. Negative values of the dispersion parameter indicate adjustment for underdispersion. You can use the GP model for overdispersed data as well, but generally the NB model is better. When it comes down to it, its best to determine the cause for underdispersion and then select the most appropriate model to deal with it.

Joseph Hilbe
  • 171
  • 1
  • 2
  • Welcome back! Please register &/or merge your accounts (you can find information on how to do this in the **My Account** section of our [help]), then you will be able to edit & comment on your own question. (Your original account is [here](http://stats.stackexchange.com/users/48456/joseph-hilbe).) – gung - Reinstate Monica Sep 27 '16 at 15:15
  • Can you perform a generalized Poisson analysis on SPSS? – Grace Carroll Sep 05 '19 at 12:34
  • Very helpful, but it seems like VGAM has now disabled underdispersion? – John Madden Jan 23 '22 at 19:01
4

I encountered an under dispersed Poisson once that had to do with frequency at which people would play a social game. It turned out this was due to the extreme regularity with which people would play on Fridays. Removing Friday data gave me the expected overdispersed Poisson. Perhaps you have the option to similarly edit your data.

Meadowlark Bradsher
  • 1,003
  • 10
  • 23
1

There are situations where underdispersion coalesces with zero-inflation which is typical for preferred children counts by individuals of both sexes. I haven't found a way to capture this to date

Germaniawerks
  • 1,027
  • 1
  • 10
  • 15
1

It seems that the solution provided by Joseph Hilbe within the vgam package is no longer available. From the manual of the package: The genpoisson() has been simplified to genpoisson0 by only handling positive parameters, hence only overdispersion relative to the Poisson is accommodated. Some of the reasons for this are described in Scollnik (1998), e.g., the probabilities do not sum to unity when lambda is negative. To simply things, VGAM 1.1-4 and later will only handle positive lambda.

user36756
  • 31
  • 5