I'm a newbie, so apologies in advance if this Q is missing any useful detail. I'm trying to test the effect of condition upon the number of times certain behaviors are produced by a group of individuals. Each individual is tested multiple times in different sessions. My data looks like this:
Subject Session Condition Count
1 1 A 0
1 2 A 0
1 3 A 0
1 4 A 0
2 1 A 0
2 2 A 0
2 3 A 0
2 4 A 0
3 1 A 0
3 2 A 0
3 3 A 0
3 4 A 0
4 1 A 0
4 2 A 0
4 3 A 0
4 4 A 0
5 1 A 0
5 2 A 0
5 3 A 0
5 4 A 0
6 1 A 0
6 2 A 0
6 3 A 0
6 4 A 0
1 1 B 0
1 2 B 0
1 3 B 0
1 4 B 0
2 1 B 0
2 2 B 0
2 3 B 0
2 4 B 0
3 1 B 0
3 2 B 1
3 3 B 0
3 4 B 1
4 1 B 1
4 2 B 1
4 3 B 0
4 4 B 0
5 1 B 0
5 2 B 0
5 3 B 0
5 4 B 0
6 1 B 3
6 2 B 0
6 3 B 0
6 4 B 0
1 1 C 0
1 2 C 0
1 3 C 0
1 4 C 0
2 1 C 0
2 2 C 0
2 3 C 0
2 4 C 0
3 1 C 0
3 2 C 0
3 3 C 1
3 4 C 0
4 1 C 0
4 2 C 1
4 3 C 0
4 4 C 1
5 1 C 0
5 2 C 0
5 3 C 0
5 4 C 0
6 1 C 1
6 2 C 0
6 3 C 0
6 4 C 0
1 1 D 0
1 2 D 0
1 3 D 0
1 4 D 0
2 1 D 1
2 2 D 0
2 3 D 0
2 4 D 0
3 1 D 0
3 2 D 0
3 3 D 1
3 4 D 2
4 1 D 0
4 2 D 0
4 3 D 4
4 4 D 3
5 1 D 0
5 2 D 1
5 3 D 0
5 4 D 0
6 1 D 3
6 2 D 3
6 3 D 2
6 4 D 2
1 1 E 1
1 2 E 0
1 3 E 0
1 4 E 0
2 1 E 0
2 2 E 0
2 3 E 0
2 4 E 0
3 1 E 2
3 2 E 0
3 3 E 2
3 4 E 2
4 1 E 2
4 2 E 1
4 3 E 2
4 4 E 3
5 1 E 0
5 2 E 0
5 3 E 2
5 4 E 0
6 1 E 6
6 2 E 1
6 3 E 3
6 4 E 1
1 1 F 0
1 2 F 0
1 3 F 0
1 4 F 0
2 1 F 0
2 2 F 0
2 3 F 0
2 4 F 0
3 1 F 0
3 2 F 0
3 3 F 0
3 4 F 0
4 2 F 2
4 3 F 2
4 4 F 0
4 4 F 0
5 1 F 0
5 2 F 0
5 3 F 0
5 4 F 0
6 1 F 8
6 2 F 10
6 3 F 4
6 4 F 7
1 1 G 0
1 2 G 0
1 3 G 0
1 4 G 0
2 1 G 0
2 2 G 0
2 3 G 0
2 4 G 0
3 1 G 0
3 2 G 0
3 3 G 0
3 4 G 1
4 1 G 1
4 2 G 0
4 3 G 1
4 4 G 1
5 1 G 0
5 2 G 0
5 3 G 0
5 4 G 0
6 1 G 1
6 2 G 1
6 3 G 1
6 4 G 3
'data.frame': 168 obs. of 4 variables:
$ Subject : Factor w/ 6 levels "1","2","3","4",..: 1 1 1 1 2 2 2 2 3 3 ...
$ Session : int 1 2 3 4 1 2 3 4 1 2 ...
$ Condition: Factor w/ 7 levels "A","B","C","D",..: 1 1 1 1 1 1 1 1 1 1 ...
$ Count : int 0 0 0 0 0 0 0 0 0 0 ...
There are many zeros (but these are "true" zeros, in that they are expected given the conditions - the individuals are able to produce the behaviors at all times, but do so very infrequently under certain conditions, particularly condition A).
When I used a glmer with poisson family in lme4 the model failed to converge and I was advised to use bglmer, from package blme. This is my code:
m<- bglmer(Count~Condition + Session + (1|Subject),data=dat,
family=poisson, fixef.prior = normal(cov = diag(9,8)), control=glmerControl(optimizer="bobyqa"))
Here is the output (with condition E as baseline):
Cov prior : Subject ~ wishart(df = 3.5, scale = Inf, posterior.scale = cov, common.scale = TRUE)
Fixef prior: normal(sd = c(3, 3, ...), corr = c(0 ...), common.scale = FALSE)
Prior dev : 33.0517
Generalized linear mixed model fit by maximum likelihood (Laplace Approximation) ['bglmerMod']
Family: poisson ( log )
Formula: Count ~ Condition + Session + (1 | Subject)
Data: dat
Control: glmerControl(optimizer = "bobyqa")
AIC BIC logLik deviance df.resid
267.1 295.2 -124.6 249.1 159
Scaled residuals:
Min 1Q Median 3Q Max
-1.4950 -0.3985 -0.2257 -0.0574 4.0170
Random effects:
Groups Name Variance Std.Dev.
Subject (Intercept) 3.408 1.846
Number of obs: 168, groups: Subject, 6
Fixed effects:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -0.66015 0.80543 -0.820 0.412429
ConditionA -4.09023 1.33829 -3.056 0.002241 **
ConditionB -1.33464 0.41532 -3.213 0.001311 **
ConditionC -1.86472 0.51534 -3.618 0.000296 ***
ConditionD -0.20941 0.28401 -0.737 0.460919
ConditionF 0.20791 0.25717 0.808 0.418836
ConditionG -0.98801 0.36455 -2.710 0.006725 **
Session -0.07237 0.08819 -0.821 0.411853
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Correlation of Fixed Effects:
(Intr) CndtnA CndtnB CndtnC CndtnD CndtnF CndtnG
ConditionA -0.025
ConditionB -0.098 0.051
ConditionC -0.078 0.041 0.161
ConditionD -0.145 0.075 0.298 0.237
ConditionF -0.145 0.083 0.329 0.263 0.487
ConditionG -0.113 0.058 0.231 0.184 0.341 0.377
Session -0.244 0.000 -0.002 -0.001 -0.003 -0.068 -0.002
My problem is that the residuals are not normally distributed (I'm having trouble uploading the pic of the QQplot, but here is the Shapiro test of residuals):
Shapiro-Wilk normality test
data: resid(m)
W = 0.90746, p-value = 8.413e-09
I think this means that the SEs and p-values may be unreliable. I read that I should try to calculate robust SEs, but the "sandwich" package does not seem to work on bglmer models. Does anyone have any suggestions for how I might proceed, by either changing my model or calculating these SEs?
I have tried some other options, such as sqrt transformation and using lmer - this appears to solve the problem of the residuals, but then the model seems to be over-dispersed. I have also tried zero-inflated nb models (using glmmTMB) but they seem to have the same problem with the residuals. Any help would be much appreciated!