Violation of normality of residuals in glmm

Question

I'm a newbie, so apologies in advance if this Q is missing any useful detail. I'm trying to test the effect of condition upon the number of times certain behaviors are produced by a group of individuals. Each individual is tested multiple times in different sessions. My data looks like this:

Subject Session Condition Count
1   1   A   0
1   2   A   0
1   3   A   0
1   4   A   0
2   1   A   0
2   2   A   0
2   3   A   0
2   4   A   0
3   1   A   0
3   2   A   0
3   3   A   0
3   4   A   0
4   1   A   0
4   2   A   0
4   3   A   0
4   4   A   0
5   1   A   0
5   2   A   0
5   3   A   0
5   4   A   0
6   1   A   0
6   2   A   0
6   3   A   0
6   4   A   0
1   1   B   0
1   2   B   0
1   3   B   0
1   4   B   0
2   1   B   0
2   2   B   0
2   3   B   0
2   4   B   0
3   1   B   0
3   2   B   1
3   3   B   0
3   4   B   1
4   1   B   1
4   2   B   1
4   3   B   0
4   4   B   0
5   1   B   0
5   2   B   0
5   3   B   0
5   4   B   0
6   1   B   3
6   2   B   0
6   3   B   0
6   4   B   0
1   1   C   0
1   2   C   0
1   3   C   0
1   4   C   0
2   1   C   0
2   2   C   0
2   3   C   0
2   4   C   0
3   1   C   0
3   2   C   0
3   3   C   1
3   4   C   0
4   1   C   0
4   2   C   1
4   3   C   0
4   4   C   1
5   1   C   0
5   2   C   0
5   3   C   0
5   4   C   0
6   1   C   1
6   2   C   0
6   3   C   0
6   4   C   0
1   1   D   0
1   2   D   0
1   3   D   0
1   4   D   0
2   1   D   1
2   2   D   0
2   3   D   0
2   4   D   0
3   1   D   0
3   2   D   0
3   3   D   1
3   4   D   2
4   1   D   0
4   2   D   0
4   3   D   4
4   4   D   3
5   1   D   0
5   2   D   1
5   3   D   0
5   4   D   0
6   1   D   3
6   2   D   3
6   3   D   2
6   4   D   2
1   1   E   1
1   2   E   0
1   3   E   0
1   4   E   0
2   1   E   0
2   2   E   0
2   3   E   0
2   4   E   0
3   1   E   2
3   2   E   0
3   3   E   2
3   4   E   2
4   1   E   2
4   2   E   1
4   3   E   2
4   4   E   3
5   1   E   0
5   2   E   0
5   3   E   2
5   4   E   0
6   1   E   6
6   2   E   1
6   3   E   3
6   4   E   1
1   1   F   0
1   2   F   0
1   3   F   0
1   4   F   0
2   1   F   0
2   2   F   0
2   3   F   0
2   4   F   0
3   1   F   0
3   2   F   0
3   3   F   0
3   4   F   0
4   2   F   2
4   3   F   2
4   4   F   0
4   4   F   0
5   1   F   0
5   2   F   0
5   3   F   0
5   4   F   0
6   1   F   8
6   2   F   10
6   3   F   4
6   4   F   7
1   1   G   0
1   2   G   0
1   3   G   0
1   4   G   0
2   1   G   0
2   2   G   0
2   3   G   0
2   4   G   0
3   1   G   0
3   2   G   0
3   3   G   0
3   4   G   1
4   1   G   1
4   2   G   0
4   3   G   1
4   4   G   1
5   1   G   0
5   2   G   0
5   3   G   0
5   4   G   0
6   1   G   1
6   2   G   1
6   3   G   1
6   4   G   3



  'data.frame': 168 obs. of  4 variables:
 $ Subject  : Factor w/ 6 levels "1","2","3","4",..: 1 1 1 1 2 2 2 2 3 3 ...
 $ Session  : int  1 2 3 4 1 2 3 4 1 2 ...
 $ Condition: Factor w/ 7 levels "A","B","C","D",..: 1 1 1 1 1 1 1 1 1 1 ...
 $ Count    : int  0 0 0 0 0 0 0 0 0 0 ...

There are many zeros (but these are "true" zeros, in that they are expected given the conditions - the individuals are able to produce the behaviors at all times, but do so very infrequently under certain conditions, particularly condition A).

When I used a glmer with poisson family in lme4 the model failed to converge and I was advised to use bglmer, from package blme. This is my code:

m<- bglmer(Count~Condition + Session + (1|Subject),data=dat, 
       family=poisson, fixef.prior = normal(cov = diag(9,8)), control=glmerControl(optimizer="bobyqa"))

Here is the output (with condition E as baseline):

    Cov prior  : Subject ~ wishart(df = 3.5, scale = Inf, posterior.scale = cov, common.scale = TRUE)
Fixef prior: normal(sd = c(3, 3, ...), corr = c(0 ...), common.scale = FALSE)
Prior dev  : 33.0517

Generalized linear mixed model fit by maximum likelihood (Laplace Approximation) ['bglmerMod']
 Family: poisson  ( log )
Formula: Count ~ Condition + Session + (1 | Subject)
   Data: dat
Control: glmerControl(optimizer = "bobyqa")

     AIC      BIC   logLik deviance df.resid 
   267.1    295.2   -124.6    249.1      159 

Scaled residuals: 
    Min      1Q  Median      3Q     Max 
-1.4950 -0.3985 -0.2257 -0.0574  4.0170 

Random effects:
 Groups  Name        Variance Std.Dev.
 Subject (Intercept) 3.408    1.846   
Number of obs: 168, groups:  Subject, 6

Fixed effects:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept) -0.66015    0.80543  -0.820 0.412429    
ConditionA  -4.09023    1.33829  -3.056 0.002241 ** 
ConditionB  -1.33464    0.41532  -3.213 0.001311 ** 
ConditionC  -1.86472    0.51534  -3.618 0.000296 ***
ConditionD  -0.20941    0.28401  -0.737 0.460919    
ConditionF   0.20791    0.25717   0.808 0.418836    
ConditionG  -0.98801    0.36455  -2.710 0.006725 ** 
Session     -0.07237    0.08819  -0.821 0.411853    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Correlation of Fixed Effects:
           (Intr) CndtnA CndtnB CndtnC CndtnD CndtnF CndtnG
ConditionA -0.025                                          
ConditionB -0.098  0.051                                   
ConditionC -0.078  0.041  0.161                            
ConditionD -0.145  0.075  0.298  0.237                     
ConditionF -0.145  0.083  0.329  0.263  0.487              
ConditionG -0.113  0.058  0.231  0.184  0.341  0.377       
Session    -0.244  0.000 -0.002 -0.001 -0.003 -0.068 -0.002

My problem is that the residuals are not normally distributed (I'm having trouble uploading the pic of the QQplot, but here is the Shapiro test of residuals):

Shapiro-Wilk normality test

data:  resid(m)
W = 0.90746, p-value = 8.413e-09

I think this means that the SEs and p-values may be unreliable. I read that I should try to calculate robust SEs, but the "sandwich" package does not seem to work on bglmer models. Does anyone have any suggestions for how I might proceed, by either changing my model or calculating these SEs?

I have tried some other options, such as sqrt transformation and using lmer - this appears to solve the problem of the residuals, but then the model seems to be over-dispersed. I have also tried zero-inflated nb models (using glmmTMB) but they seem to have the same problem with the residuals. Any help would be much appreciated!

Why do you think the residuals should be normal? You have estimated a Poisson model, not a normal model, so there is **no assumption of normal errors** in that model. One solution is to calculate simulated residuals, there is an R package, [DHARMa](https://stats.stackexchange.com/search?q=DHARMa+answers%3A1) but if that supports the package you used I don't know. You could still program the same idea! — kjetil b halvorsen, Jan 05 '19 at 10:29
Ah, I thought I had read that non-normal residuals would cause a problem even in Poisson models, but must have misunderstood. Thank you for your reply @kjetilbhalvorsen! — Lynn, Jan 08 '19 at 05:13

Violation of normality of residuals in glmm

0 Answers0