Why does SAS PROC GLIMMIX give me VERY different random slopes than glmer (lme4) for a binomial glmm

Question

I am a user more familiar with R, and have been trying to estimate random slopes (selection coefficients) for about 35 individuals over 5 years for four habitat variables. The response variable is whether a location was "used" (1) or "available" (0) habitat ("use" below).

I am using a Windows 64-bit computer.

In R version 3.1.0, I use the data and expression below. PS, TH, RS, and HW are fixed effects (standardized, measured distance to habitat types). lme4 V 1.1-7.

str(dat)
'data.frame':   359756 obs. of  7 variables:
 $ use     : num  1 1 1 1 1 1 1 1 1 1 ...
 $ Year    : Factor w/ 5 levels "1","2","3","4",..: 4 4 4 4 4 4 4 4 3 4 ...
 $ ID      : num  306 306 306 306 306 306 306 306 162 306 ...
 $ PS: num  -0.32 -0.317 -0.317 -0.318 -0.317 ...
 $ TH: num  -0.211 -0.211 -0.211 -0.213 -0.22 ...
 $ RS: num  -0.337 -0.337 -0.337 -0.337 -0.337 ...
 $ HW: num  -0.0258 -0.19 -0.19 -0.19 -0.4561 ...

glmer(use ~  PS + TH + RS + HW +
     (1 + PS + TH + RS + HW |ID/Year),
     family = binomial, data = dat, control=glmerControl(optimizer="bobyqa"))

glmer gives me parameter estimates for the fixed effects that make sense to me, and the random slopes (which I interpret as selection coefficients to each habitat type) also make sense when I investigate the data qualitatively. The log-likelihood for the model is -3050.8.

However, most research in animal ecology do not use R because with animal location data, spatial autocorrelation can make the standard errors prone to type I error. While R uses model-based standard errors, empirical (also Huber-white or sandwich) standard errors are preferred.

While R does not currently offer this option (to my knowledge - PLEASE, correct me if I am wrong), SAS does - although I do not have access to SAS, a colleague agreed to let me borrow his computer to determine if the standard errors change significantly when the empirical method is used.

First, we wished to ensure that when using model-based standard errors, SAS would produce similar estimates to R - to be certain that the model is specified the same way in both programs. I don't care if they are exactly the same - just similar. I tried (SAS V 9.2):

proc glimmix data=dat method=laplace;
   class year id;
   model use =  PS TH RS HW / dist=bin solution ddfm=betwithin;
   random intercept PS TH RS HW / subject = year(id) solution type=UN;
run;title;

I also tried various other forms, such as adding lines

random intercept / subject = year(id) solution type=UN;
random intercept PS TH RS HW / subject = id solution type=UN;

I tried without specifying the

solution type = UN,

or commenting out

ddfm=betwithin;

No matter how we specify the model (and we have tried many ways), I cannot get the random slopes in SAS to remotely resemble those output from R - even though the fixed effects are similar enough. And when I mean different, I mean that not even the signs are the same. The -2 Log Likelihood in SAS was 71344.94.

I can't upload my full dataset; so I made a toy dataset with only the records from three individuals. SAS gives me output in a few minutes; in R it takes over an hour. Weird. With this toy dataset I'm now getting different estimates for the fixed effects.

My question: Can anyone shed light on why the random slopes estimates might be so different between R and SAS? Is there anything I can do in R, or SAS, to modify my code so that the calls produce similar results? I'd rather change the code in SAS, since I "believe" my R estimates more.

I'm really concerned with these differences and want to get to the bottom of this problem!

My output from a toy dataset that uses only three of the 35 individuals in the full dataset for R and SAS are included as jpegs.

R output SAS output 1 SAS output 2 SAS output 3

EDIT AND UPDATE:

As @JakeWestfall helped discover, the slopes in SAS do not include the fixed effects. When I add the fixed effects, here is the result - comparing R slopes to SAS slopes for one fixed effect, "PS", between programs: (Selection coefficient = random slope). Note the increased variation in SAS.

R vs SAS for PS

I notice that `ID` isn't a factor in R; check and see if that changes anything. — Aaron left Stack Overflow, Sep 09 '14 at 21:00
I see that you are fitting both using Laplace Approximation for the log-likelihood. What are their respective log-likelihood scores? — usεr11852, Sep 09 '14 at 21:03
I don't see anything obvious. `ID` should get converted to a factor when it's used as a grouping variable. I'm not sure whether the nesting syntax is the same: `ID/Year` means "year nested within ID", which seems reasonable; is this equivalent to the SAS spec? You may send me data if you like ... — Ben Bolker, Sep 09 '14 at 21:45
Have you checked that you are modeling the dependent variable in the same direction? — Peter Flom, Sep 09 '14 at 22:20
**@usεr11852**, I edited my post to include the loglik and -2Log Likelihood results from R and SAS. **@Ben Bolker**, according to the [SAS help menu](http://support.sas.com/documentation/cdl/en/statug/63347/HTML/default/viewer.htm#statug_anova_sect017.htm), the nested random effect comes first, followed by the main random effect in parentheses - so my syntax should match that in R. @Peter Flom - not sure what you mean? — Nova, Sep 10 '14 at 13:37
I can verify that the two nesting syntaxes should be equivalent. @EricaN I think you really need to paste the full output from both `SAS` and `R`. (I notice for example that the log-likelihoods appear to be on radically different scales, is this perhaps a typo or pasting error?) Even better would be an actual reproducible example, but maybe we can start with the full outputs. — Jake Westfall, Sep 10 '14 at 15:13
By the way, what Peter is getting at is that, by default with binomial data labeled as `0`s and `1`s, `R` will model the probability of a "1" response while SAS will model the probability of a "0" response. To make SAS model the probability of "1" you need to write your response variable as `use(event='1')`. Of course, even without doing this I believe we should still expect the same estimates of the random effect variances, as well as the same fixed effect estimates albeit with their signs reversed. — Jake Westfall, Sep 10 '14 at 15:40
@JakeWestfall, I added the output from my toy dataset (so that if I upload it we are both working from the same dataset). What is the recommended repository if I choose to share a .csv? — Nova, Sep 10 '14 at 18:25
The correlation between your random effects is pretty high. That can sometimes lead to computational issues, if so, centering the variables can sometimes help. — Aaron left Stack Overflow, Sep 10 '14 at 18:43
Also note the binned residual plots are disturbing, so maybe these data need more than a little bit of tweaking... — Nova, Sep 10 '14 at 20:59
As @Jake mentions these log-likelihoods are vastly different so it will not be surprising for something fishy to be going on numerically. Can you check what results you will you get if instead of BOBYQA for the optimization you use `method="nlminb"` or `method="L-BFGS-B"`? Do they converge to approximately the same solution? Finally, numerics aside, are you able to interpreter the coefficients? Do they appear sensible based on your knowledge of the problem? Eg. the variable `RS` switches signs between the runs, any idea what the sign might be? — usεr11852, Sep 10 '14 at 22:53
It's ok that the sign for RS switches between runs I think... there is not a strong signal of this habitat type, so I wouldn't expect the results for that one to be consistent. However, I DO expect that most of the random slopes for HW to all be negative. Perhaps I should try to log transform the data and try again, with your suggestions @usεr11852. Unfortunately, a "try" takes over 10 hours with glmer... — Nova, Sep 11 '14 at 13:13
@Aaron, I should mention that the variables have been standardized by subtracting the mean and dividing by two standard deviations. — Nova, Sep 19 '14 at 15:48
@JakeWestfall, I just noticed that the random slopes from my model in glmer are very similar to those output by SAS when I _do not_ specify fixed effects in glmer - I can see that SAS is computing the estimates for fixed effects, but perhaps this is a clue? I'm still trying to get this to work... — Nova, Sep 24 '14 at 16:04
@EricaN One thing that you just reminded me of is that you should compare the random effects from R to those in SAS using the `ranef()` function rather than `coef()`. The former gives the actual random effects, while the latter gives the random effects plus the fixed-effects vector. So this accounts for a lot of why the numbers illustrated in your post differ, but there is still a substantial discrepancy remaining that I can't totally explain. — Jake Westfall, Sep 25 '14 at 02:40
@JakeWestfall we are getting closer to an answer here... so if I "add" the fixed effects to each random slope in SAS, they should output very similar answers. Even though the _trend_ is now more similar, you are right - there is still a lot of discrepancy. — Nova, Sep 25 '14 at 17:52
@JakeWestfall, do you know how to ask SAS for the random slopes with fixed effects added or do you do it manually? — Nova, Sep 25 '14 at 18:01
@EricaN I did it manually by adding -1.87 to the random effects. — Jake Westfall, Sep 25 '14 at 19:01
I think SAS gives the deviance estimate (Difference: fixed effect model-Random effect model). — wilson Suraweera, Dec 12 '20 at 17:38

score 12 · Accepted Answer · edited Jun 11 '20 at 14:32

It appears that I shouldn't have expected the random slopes to be similar between packages, according to Zhang et al 2011. In their paper On Fitting Generalized Linear Mixed-effects Models for Binary Responses using Different Statistical Packages, they describe:

Abstract:

The generalized linear mixed-effects model (GLMM) is a popular paradigm to extend models for cross-sectional data to a longitudinal setting. When applied to modeling binary responses, different software packages and even different procedures within a package may give quite different results. In this report, we describe the statistical approaches that underlie these different procedures and discuss their strengths and weaknesses when applied to fit correlated binary responses. We then illustrate these considerations by applying these procedures implemented in some popular software packages to simulated and real study data. Our simulation results indicate a lack of reliability for most of the procedures considered, which carries significant implications for applying such popular software packages in practice.

I hope @BenBolker and team will consider my question as a vote for R to incorporate empirical standard errors and Gauss-Hermite Quadrature capability for models with several random slope terms to glmer, as I prefer the R interface and would love to be able to apply some further analyses in that program. Happily, even while R and SAS do not have comparable values for random slopes, the overall trends are the same. Thanks all for your input, I really appreciate the time and consideration that you put into this!

sorry: what's a "standard standard error"? Do you mean standard errors of variance components? Or do you mean sandwich standard errors? — Ben Bolker, Oct 14 '14 at 20:58
sorry... meant empirical/sandwich SEs. I've edited my response. — Nova, Oct 16 '14 at 02:16
Nope. I keep trying to figure out how I'm going to support development like this, since it's not technically part of my research program ... — Ben Bolker, May 06 '17 at 01:12

score 4 · Answer 2 · answered Sep 12 '14 at 03:06

A mixture of an answer and commentary/more questions:

I fitted the 'toy' data set with three different optimizer choices. (*Note 1: it would probably be more useful for comparative purposes to make a small data set by subsampling from within every year and id, rather than by subsampling the grouping variables. As it is, we know that the GLMM won't perform particularly well with such a small number of grouping variable levels. You can do this via something like:

library(plyr)
subdata <- ddply(fulldata,c("year","id"),
    function(x) x[sample(nrow(x),size=round(nrow(x)*0.1)),])

Batch fitting code:

Ntoy <- readRDS("Newton_toy.RDS")
library(lme4)
fitfun <- function(opt) {
    tt <- system.time(fit1 <- glmer(use ~  ps + th + rs + hw +
                                    (1 + ps + th + rs + hw |id/year),
                                    family = binomial, data = Ntoy,
                                    control=glmerControl(optimizer=opt),
                                    verbose=100))
    return(list(time=tt,fit=fit1))
}

opts <- c("nloptwrap","nlminbwrap","bobyqa")
## use for() instead of lapply so we can checkpoint more easily
res <- setNames(vector("list",length(opts)),opts)
for (i in opts) {
    res[[i]] <- fitfun(i)
    save("res",file="Newton_batch.RData")
}

Then I read in the results in a new session:

load("Newton_batch.RData")
library(lme4)

Elapsed time and deviance:

cbind(time=unname(sapply(res,function(x) x$time["elapsed"])),
          dev=sapply(res,function(x) deviance(x$fit)))
##                time      dev
## nloptwrap  1001.824 6067.706
## nlminbwrap 3495.671 6068.730
## bobyqa     4945.332 6068.731

These deviances are considerably below the deviance reported by the OP from R (6101.7), and slightly below the ones reported by the OP from SAS (6078.9), although comparing deviances across packages is not always sensible.

I was indeed surprised that SAS converged in only about 100 function evaluations!

Times range from 17 minutes (nloptwrap) to 80 minutes (bobyqa) on a Macbook Pro, consistent with the OP's experience. Deviance is a bit better for nloptwrap.

round(cbind(sapply(res,function(x) fixef(x$fit))),3)
##             nloptwrap nlminbwrap bobyqa
## (Intercept)    -5.815     -5.322 -5.322
## ps             -0.989      0.171  0.171
## th             -0.033     -1.342 -1.341
## rs              1.361     -0.140 -0.139
## hw             -2.100     -2.082 -2.082

Answers appear quite different with nloptwrap -- although the standard errors are quite large ...

round(coef(summary(res[[1]]$fit)),3)
##             Estimate Std. Error z value Pr(>|z|)
## (Intercept)   -5.815      0.750  -7.750    0.000
## ps            -0.989      1.275  -0.776    0.438
## th            -0.033      2.482  -0.013    0.989
## rs             1.361      2.799   0.486    0.627
## hw            -2.100      0.490  -4.283    0.000

(code here gives some warnings about year:id that I should track down)

To be continued ... ?

would it be more helpful if I sent you the full dataset? The only problem is that convergence takes about 9 hrs with the full dataset, so your suggestion regarding sampling is a good one. I tried to transform the data with a log transformation, but the binned residual plot is still ugly - do you think the residual plot explains part of the problem with these data? Finally - were your results in SAS similar to R's? — Nova, Sep 12 '14 at 13:42

Why does SAS PROC GLIMMIX give me VERY different random slopes than glmer (lme4) for a binomial glmm

2 Answers2

Linked

Related