4

I want to model counts as being dependent on two nominal variables, one continuous variable (all as fixed effects) with 3rd-order interactions and one grouping variable (as random effect). However, I have an overdispersion in outcomes (I used the glmer function from lme4 library). How should I manage this? I have found some solution for the problem (https://stats.stackexchange.com/a/9670/38080) but I am not able to incorporate that recommendation into my model.

Here is my model:

m1<-glmer(dependent.var ~ cat.var1 * cat.var2 * contin.var + (1|group),
         data = dat, family = "poisson")

Any suggestion? (I did it also like a marginal model with 'geeglm' function (library geepack), but I would like to calculate R-squared of the model, which is possible to obtain just from former GLMM (see Nakagawa & Schielzeth 2013; http://onlinelibrary.wiley.com/doi/10.1111/j.2041-210x.2012.00261.x/abstract).)

benjamin jarcuska
  • 369
  • 1
  • 4
  • 16
  • 3
    Can you expand on the statement that you were not able to incorporate the recommendation of including `(1|subject_id)` ? – Aniko Jan 28 '14 at 14:33
  • Number of measured subjects in my dataset is equal to number of observations or records in the dataset. Thus I do not know if I should incorporate both (1|subject_id) and 'data$obs_effect – benjamin jarcuska Jan 28 '14 at 14:39
  • You can call it `subject_id` or `obs_effect` - they are the same thing in your case. Include one of them, and any other fixed and random effects you need. – Aniko Jan 28 '14 at 14:49
  • I did it like this: m1 – benjamin jarcuska Jan 28 '14 at 14:59
  • You can use the negative.binomial() function from the MASS package in the family argument I the glmer() call. You need to supply a known value for theta though. You could run the model as a standard glm.nb() model without the random effects and use the estimated value of theta from that, though I don't know whether this approach is valid or not? Perhaps someone else can comment? – Martyn Jan 28 '14 at 15:37
  • see http://glmm.wikidot.com/faq#overdispersion_est ... – Ben Bolker Jan 28 '14 at 17:36
  • @Ben and @Aniko , I had used that (http://glmm.wikidot.com/faq#overdispersion_est) function for the model: > m1 – benjamin jarcuska Jan 28 '14 at 20:31
  • @Ben, thus I did > m2 – benjamin jarcuska Jan 28 '14 at 20:34
  • 1
    @BenBolker Here are posted my data: https://docs.google.com/file/d/0Bz8ojhHeiNclVi1oT0ZwTEtEN2s/edit – benjamin jarcuska Jan 29 '14 at 06:37

1 Answers1

7

Pulling out the answer from the discussion in the comments: If you create an obs_effect variable (observation-level random effect) with a unique value for each observation (say, 1:nrow(dat)), then you can incorporate overdispersion in the model by fitting

m2 <- glmer(dependent.var ~  cat.var1 * cat.var2 * contin.var + (1|obs_effect) + (1|group),
           data = dat, family = "poisson")

You also state in the comments that your problem is that the residual plot is triangle-shaped, which I interpret as the variability of the residuals increases with the predicted value. Depending on what kind of residuals you are plotting, this might mean nothing (the variability of observed - fitted should increase with fitted), or might mean that you have a problem other than overdispersion, which does not show up in a residual plot.

Reference: Harrison, X.A., 2014. Using observation-level random effects to model overdispersion in count data in ecology and evolution. PeerJ 2, e616. doi:10.7717/peerj.616

amoeba
  • 93,463
  • 28
  • 275
  • 317
Aniko
  • 10,209
  • 29
  • 32
  • 1
    thank you for your answer. The variability of residuals of m2-model decreases with fitted values. Interestingly, in overdispersed Poisson GLMM, scatter of those residuals against fitted values is OK. Please, see my last two comments above for more info. – benjamin jarcuska Jan 28 '14 at 20:41