5

I'm working on the following model in R:

  Generalized linear mixed model fit by maximum likelihood ['glmerMod']
  Family: binomial (logit)
  Formula: Tooluse ~ Sex + Age + Frequency + Tool.related.skill +
      (1|Trial) + (1 + Frequency|Subjectnumber) + 
      (1 + Tool.related.skill|Frequency/Task) 
  Data: g4      

with

  • Tooluse (yes, no)
  • age (continuous)
  • tool.related.skill (ordinal)
  • trial (1-4)
  • frequency (low, high)
  • task (1-12, nested within frequency. 6 tasks belong to the low frequency group, 6 tasks to high frequency)

My research question looks at the effect of the frequency variable on tool use.

Testing the model assumptions, I get this output for the test of overdispersion:

overdisp.test (B1NF.FULL)  
##       chisq     df    P   dispersion.parameter 
##    1 36.68702  141    1      0.2601916

How can I deal with the problem of underdispersion? So far I got 3 suggestions (2 of them from one of the authors of the lme4 package):

1) using mixture/hurdle models

2) allowing a negative correlation structure within groups (which can't be done with lme4 and is harder for GLMMs in general)

3) standard 'quasi-likelihood' approach, i.e. taking the estimated level of underdispersion and shrinking all the confidence intervals accordingly as a first approach. However, I got warned that the thing to be careful about there is that it has yet to be figured out how quasi-likelihood estimates of 'residual' variance interact with the estimates of the random effects variances

I would greatly appreciate any opinions and especially any help on how to implement any of these strategies in R. I feel kind of lost here.

Ben Bolker
  • 34,308
  • 2
  • 93
  • 126
Eva
  • 63
  • 2
  • 9
  • actually, now that you add some more context, I'm not sure your question makes any sense/that you have anything to worry about. For *binary* data, where there aren't sets of responses that share the same exact predictor variable values (i.e. the data couldn't be grouped into homogeneous subsets somehow), under/overdispersion are unidentifiable anyway ... – Ben Bolker Apr 01 '14 at 18:07
  • Thank you very much for your help! Do you have any reference or any literature on that I could look into or cite? – Eva Apr 01 '14 at 21:21
  • 1
    googling for "Bernoulli underdispersion GLMM" leads to https://stat.ethz.ch/pipermail/r-sig-mixed-models/2010q3/004505.html which cites Gelman and Hill 2007, p302, and to https://doclib.uhasselt.be/dspace/bitstream/1942/13954/3/overdispersionbinary05.pdf – Ben Bolker Apr 01 '14 at 22:01
  • 1
    See here: http://stats.stackexchange.com/questions/133635/overdispersion-and-underdispersion-in-negative-binomial-poisson-regression?rq=1 – StatsStudent Mar 09 '16 at 21:44

1 Answers1

5

For binary outcomes, overdispersion or underdispersion are only identifiable (i.e., can only be meaningfully measured) if sets of individuals with identical predictors can be grouped. For example, if the data look like

response  fac1  fac2
0         A     A
0         A     A
1         A     B
0         A     B

(a ridiculously small sample that will lead to other problems such as complete separation if we actually tried to use it in a model), we could group it by unique sets of predictors:

successes  total  fac1   fac2
0          2      A      A
1          2      A      B

and then analyze it as a binomial response with number of trials>1 and use the various techniques suggested above (as well as ordinal models, e.g. the ordinal package in R) to handle over/underdispersion.

If you have truly binary, ungroupable outcomes (e.g. one of your response variables is a continuous predictor that is unique to individuals, as would be typical in an observational study), then (1) you can't estimate the degree of overdispersion and (2) you can't really worry about it (i.e., there may be additional sources of variability you don't know about, but they just go to inflate your uncertainty, but they don't bias your inference). This is well-known and stated e.g.

Ben Bolker
  • 34,308
  • 2
  • 93
  • 126