I have found a number of posts about how worried one should be when they get convergence warnings from lme4 (such as: How scared should we be about convergence warnings in lme4). However, I have not been able to find equivalent postings about convergence warnings when using multiple imputation datasets, specifically if the warnings are only on some of the runs but not all.
I am running random intercept models over 30 imputed datasets (imputed with mice) with a count outcome, with participants nested in schools (I’m not getting convergence warnings for my equivalent lmer models). I am not sure how concerned I should be when a handful of my imputed datasets throw convergence warnings but the rest do not. I’ve copied an example of two of the runs below, where the convergence error shows up at the end of the first summary but not the second (my data is restricted so I am not able to post it for reproducing the warnings).
Code:
RQ1fullODGX2m3 <- with(imp, glmer(ODGX2 ~ preginsch + S1cl + S6Bcl + gpainsch
+parentedimputed + delinq+ S62Ocl + noparentimputed + (1|SSCHLCDE),
family=poisson,
control=glmerControl(optimizer = "bobyqa", optCtrl = list(maxfun=100000)),
subset (RQ1full==1 & (preginsch==1 | futurepreg ==1))))
Excerpt of summary():
## summary of imputation 26 :
Generalized linear mixed model fit by maximum likelihood (Laplace
Approximation) ['glmerMod'] Family: poisson ( log )
Formula: ODGX2 ~ preginsch + S1cl + S6Bcl + otherraceinsch + gpainsch +
parentedimputed + delinq + S62Ocl + noparentimputed + (1 | SSCHLCDE)
Control: glmerControl(optimizer = "bobyqa", optCtrl = list(maxfun = 1e+05))
Subset: (RQ1full == 1 & (preginsch == 1 | futurepreg == 1))
AIC BIC logLik deviance df.resid
3148.8 3197.8 -1563.4 3126.8 628
Scaled residuals:
Min 1Q Median 3Q Max
-2.4388 -1.1207 -0.0159 0.8869 4.1627
Random effects:
Groups Name Variance Std.Dev.
SSCHLCDE (Intercept) 0.09933 0.3152
Number of obs: 639, groups: SSCHLCDE, 100
Fixed effects:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 2.282264 0.299715 7.615 2.64e-14 ***
preginsch -0.020990 0.045705 -0.459 0.646054
S1cl -0.064696 0.017899 -3.615 0.000301 ***
S6Bcl -0.245133 0.062187 -3.942 8.08e-05 ***
otherraceinsch -0.357046 0.065428 -5.457 4.84e-08 ***
gpainsch 0.068533 0.031757 2.158 0.030923 *
parentedimputed 0.070955 0.058575 1.211 0.225761
delinq -0.007203 0.019503 -0.369 0.711867
S62Ocl -0.164883 0.066465 -2.481 0.013110 *
noparentimputed -0.161624 0.083127 -1.944 0.051858 .
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Correlation of Fixed Effects:
(Intr) prgnsc S1cl S6Bcl othrrc gpnsch prntdm delinq S62Ocl
preginsch 0.119
S1cl -0.930 -0.209
S6Bcl -0.212 -0.026 0.098
otherrcnsch -0.221 -0.069 0.148 0.341
gpainsch -0.270 0.070 -0.030 0.110 0.072
parentdmptd 0.070 -0.021 -0.062 -0.116 -0.131 -0.120
delinq -0.263 0.004 0.115 0.231 0.098 0.132 -0.041
S62Ocl 0.006 -0.007 -0.041 -0.004 -0.015 0.052 -0.011 -0.078
noparntmptd -0.035 -0.041 0.010 -0.024 0.039 0.018 0.041 0.037 -0.023
convergence code: 0
Model failed to converge with max|grad| = 0.00109371 (tol = 0.001, component 1)
## summary of imputation 27 :
Generalized linear mixed model fit by maximum likelihood (Laplace
Approximation) ['glmerMod'] Family: poisson ( log )
Formula: ODGX2 ~ preginsch + S1cl + S6Bcl + otherraceinsch + gpainsch +
parentedimputed + delinq + S62Ocl + noparentimputed + (1 | SSCHLCDE)
Control: glmerControl(optimizer = "bobyqa", optCtrl = list(maxfun = 1e+05))
Subset: (RQ1full == 1 & (preginsch == 1 | futurepreg == 1))
AIC BIC logLik deviance df.resid
3147.5 3196.5 -1562.7 3125.5 628
Scaled residuals:
Min 1Q Median 3Q Max
-2.4233 -1.1163 -0.0248 0.8748 4.0993
Random effects:
Groups Name Variance Std.Dev.
SSCHLCDE (Intercept) 0.1002 0.3166
Number of obs: 639, groups: SSCHLCDE, 100
Fixed effects:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 2.152035 0.301166 7.146 8.96e-13 ***
preginsch -0.017517 0.045798 -0.382 0.702104
S1cl -0.061575 0.017894 -3.441 0.000579 ***
S6Bcl -0.228671 0.062095 -3.683 0.000231 ***
otherraceinsch -0.349868 0.065269 -5.360 8.30e-08 ***
gpainsch 0.091911 0.031648 2.904 0.003682 **
parentedimputed 0.040349 0.057918 0.697 0.486018
delinq 0.004404 0.019675 0.224 0.822877
S62Ocl -0.153198 0.066588 -2.301 0.021410 *
noparentimputed -0.161569 0.083140 -1.943 0.051977 .
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Correlation of Fixed Effects:
(Intr) prgnsc S1cl S6Bcl othrrc gpnsch prntdm delinq S62Ocl
preginsch 0.109
S1cl -0.931 -0.207
S6Bcl -0.208 -0.020 0.098
otherrcnsch -0.213 -0.068 0.147 0.338
gpainsch -0.286 0.086 -0.010 0.103 0.050
parentdmptd 0.050 -0.025 -0.047 -0.100 -0.114 -0.113
delinq -0.260 0.032 0.109 0.227 0.091 0.133 -0.027
S62Ocl 0.016 -0.001 -0.053 -0.017 -0.021 0.045 0.009 -0.052
noparntmptd -0.033 -0.040 0.009 -0.025 0.040 0.019 0.036 0.031 -0.020
I have tried different optimizers (per posts like this: https://rstudio-pubs-static.s3.amazonaws.com/33653_57fc7b8e5d484c909b615d8633c01d51.html) and am getting the similar convergence warnings results.
This is being run on a subset of my participants (~600). However, when I run models on my full data (~7000) I actually get more (lots more!) convergence warnings, which seems a bit unexpected given the comment on this post (How scared should we be about convergence warnings in lme4) that says “that optimizers run into problems when there is too little data for the number of parameters, or the proposed model is really not suitable”). The subset is what is ultimately of interest to my research question but I’m also not sure what level of concern I should have about the marked increase in warnings when run over more data.
I also tried running single level models with glm - which work fine - but don’t take into my participants being clustered in schools. When I run glms with clustered fixed effects (with the glmmML package’s glmmboot function), it overfits my models (AIC is infinite) so that is why I landed on random intercept models.
Thanks for any perspective!
EDIT: here is the convergence warning (it's at the end of summary #26 but not summary #27 in the printout above: Model failed to converge with max|grad| = 0.00109371 (tol = 0.001, component 1)