glmer (lme4) vs meglm (Stata)

Question

I'm trying to fit a binomial GLMM, but I'm ending up with very different results between lme4 and Stata.

In R, I'm running glmer(y ~ x1 + x2 + x3 + x4 + (1 + x1 | id), data=df, family=binomial(), glmerControl(optimizer="bobyqa") x4 is a factor variable. Convergence was not reached the first time hence the different optimizer. No errors arise and I get p-values of 2e-16 for all variables except x2 and x3 which are both p > 0.2

In Stata, I'm running meglm y x1 x2 x3 i.x4 || id: x1, family(binomial) link(logit) cov(unstructured) With this I get p-values of <0.001 for all variables except x2 and x3 which are 0.012 and 0.003 respectively.

The fixed effects and random effects are very different as well.

I'm very stumped as to why this would be so different. It's very concerning as they obviously have very different conclusions.

Would anyone have any idea on why they are different? Assuming the Stata one is probably correct, how do I get lme4 to do the same thing?

Stata output:

------------------------------------------------------------------------------------
          y        |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------------+----------------------------------------------------------------
                x2 |  -1.012777   .4017408    -2.52   0.012    -1.800174   -.2253791
                x3 |   1.214473   .4024367     3.02   0.003     .4257112    2.003234
                x1 |   3.921289   .1612074    24.32   0.000     3.605328    4.237249
                   |
            cohort |
             x4.2  |   2.136393   .2285006     9.35   0.000      1.68854    2.584247
             x4.3  |   5.123014   .6784815     7.55   0.000     3.793215    6.452814
             x4.4  |   3.618065   .2799368    12.92   0.000     3.069399    4.166731
             x4.5  |   4.331501   .4156959    10.42   0.000     3.516752    5.146251
                   |
             _cons |  -12.58449   .4422242   -28.46   0.000    -13.45124   -11.71775
-------------------+----------------------------------------------------------------
id                 |
           var(x1)|   1.346404   .2711099                      .9073581    1.997892
         var(_cons)|   13.64491   1.256733                      11.39129    16.34439
-------------------+----------------------------------------------------------------
id                 |
     cov(x1 ,_cons)|  -1.733158   .3161964    -5.48   0.000    -2.352892   -1.113424
------------------------------------------------------------------------------------

R output:

Random effects:
 Groups Name         Variance Std.Dev. Corr
 id (Intercept)  51.15    7.152        
        x1       26.08    5.107    0.09
Number of obs: 60491, groups:  id, 8677

Fixed effects:
                   Estimate Std. Error z value Pr(>|z|)    
(Intercept)        -18.6631     0.7250 -25.744  < 2e-16 ***
x2                  -0.6672     0.6370  -1.047    0.295    
x3                   0.8018     0.6390   1.255    0.210    
x1                   3.8315     0.2760  13.881  < 2e-16 ***
x4.2                 3.6280     0.4427   8.196  2.5e-16 ***
x4.4                 6.9718     0.5050  13.805  < 2e-16 ***
x4.5                10.1306     0.7528  13.458  < 2e-16 ***
x4.3                11.4442     1.0587  10.810  < 2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Correlation of Fixed Effects:
            (Intr) x2      x3     x1     x4.2   x4.4   x4.5
x2           0.016                                          
x3          -0.022 -0.985                                   
x1          -0.813 -0.035  0.044                            
x4.2        -0.699  0.031 -0.031  0.334                     
x4.4        -0.887  0.030 -0.029  0.583  0.789              
x4.5        -0.881  0.010 -0.007  0.713  0.647  0.813       
x4.3        -0.730 -0.002  0.004  0.638  0.497  0.645  0.659
```

Try not to make conclusions based on p values. Are the actual estimates similar ? — Robert Long, Aug 21 '20 at 15:53
Hi, welcome to CV, did either software give a convergence warning? And are both fits using the exact same model matrix for both fixed and random effects? Is the default fitting algorithm in meglm the same as glmer? — JTH, Aug 21 '20 at 16:05
Sometimes confusion like this occurs because of differences in the coding of multi-level categorical variables. Coefficients for such variables are usually reported for differences of each level against the reference level. R sets the lowest level of such a variable as the default reference while some software uses the highest level instead. Thus reported coefficient estimates and p-values might seem different between software packages even if the model as a whole is the same. I don't use Stata. If you have such variables in the model, which reference does it choose? — EdM, Aug 21 '20 at 16:59
@JTH lme4 gave a convergence error. This was alleviated by using bobyqa. I think lme4 uses Laplace approximation where as Stata uses Gauss-Jermite quadrature — ragreener1, Aug 21 '20 at 17:33
@EdM I thought that was it, but I just double checked and they are using the same level as the baseline. — ragreener1, Aug 21 '20 at 17:36
It would be interesting to both (a) start the Stata optimization with the coefficients estimated by `R` and (b) start the `R` optimization with Stata's coefficients. — whuber, Aug 21 '20 at 17:37
Try using GLMMAdaptive. Also please post relevant model output — Robert Long, Aug 21 '20 at 17:50
Another source of difference could be in how the package normalize the error variances. — dimitriy, Aug 21 '20 at 19:03
I don't think that these are the same models with respect to random effects. I'm not sure about Stata syntax, but at a quick reading its `cov(unstructured)` seems not to model correlations among random slopes and intercepts. The `glmer` syntax `(1 + x1 | id)` does. See the [cheat sheet](https://stats.stackexchange.com/q/13166/28500) and [Ben Bolker's FAQ](http://bbolker.github.io/mixedmodels-misc/glmmFAQ.html#model-specification). — EdM, Aug 21 '20 at 20:31
@RobertLong With GLMMAdaptive I get: `Error in chol.default(X[[i]], ...) : the leading minor of order 1 is not positive definite` — ragreener1, Aug 24 '20 at 07:05
@EdM `cov(unstructured)` is the command to allow Stata to model the correlations — ragreener1, Aug 24 '20 at 07:05
One striking observation in the `R` output is the almost perfect negative correlation between the coefficient estimates for predictors `x2` and `x3` ("Correlation of fixed effects" of -0.985). Was the same found with Stata? That suggests substantial collinearity between those predictors, which I suppose might lead to results depending heavily on the starting conditions and methods for the iterative fitting. What happens if you omit one of `x2` or `x3` from the model? — EdM, Sep 06 '20 at 19:43

glmer (lme4) vs meglm (Stata)

0 Answers0