Why does glmer not achieve the maximum likelihood (as verified by applying further generic optimization)?

Question

Numerically deriving the MLEs of GLMM is difficult and, in practice, I know, we should not use brute force optimization (e.g., using optim in a simple way). But for my own educational purpose, I want to try it to make sure I correctly understand the model (see the code below). I found that I always get inconsistent results from glmer().

In particular, even if I use the MLEs from glmer as initial values, according to the likelihood function I wrote (negloglik), they are not MLEs (opt1$value is smaller than opt2). I think two potential reasons are:

negloglik is not written well so that there is too much numerical error in it, and
the model specification is wrong. For the model specification, the intended model is:

\begin{equation} L=\prod_{i=1}^{n} \left(\int_{-\infty}^{\infty}f(y_i|N,a,b,r_{i})g(r_{i}|s)dr_{i}\right) \end{equation} where $f$ is a binomial pmf and $g$ is a normal pdf. I am trying to estimate $a$, $b$, and $s$. In particular, I want to know if the model specification is wrong, what the correct specification is.

p <- function(x,a,b) exp(a+b*x)/(1+exp(a+b*x))

a <- -4  # fixed effect (intercept)
b <- 1   # fixed effect (slope)
s <- 1.5 # random effect (intercept)
N <- 8
x <- rep(2:6, each=20)
n <- length(x) 
id <- 1:n
r  <- rnorm(n, 0, s) 
y  <- rbinom(n, N, prob=p(x,a+r,b))


negloglik <- function(p, x, y, N){
  a <- p[1]
  b <- p[2]
  s <- p[3]

  Q <- 100  # Inf does not work well
  L_i <- function(r,x,y){
    dbinom(y, size=N, prob=p(x, a+r, b))*dnorm(r, 0, s)
  }

  -sum(log(apply(cbind(y,x), 1, function(x){ 
    integrate(L_i,lower=-Q,upper=Q,x=x[2],y=x[1],rel.tol=1e-14)$value
  })))
}

library(lme4)
(model <- glmer(cbind(y,N-y)~x+(1|id),family=binomial))

opt0 <- optim(c(fixef(model), sqrt(VarCorr(model)$id[1])), negloglik, 
                x=x, y=y, N=N, control=list(reltol=1e-50,maxit=10000)) 
opt1 <- negloglik(c(fixef(model), sqrt(VarCorr(model)$id[1])), x=x, y=y, N=N)
opt0$value  # negative loglikelihood from optim
opt1        # negative loglikelihood using glmer generated parameters
-logLik(model)==opt1 # but these are substantially different...

A simpler example

To reduce the possibility of having large numerical error, I created a simpler example.

y  <- c(0, 3)
N  <- c(8, 8)
id <- 1:length(y)

negloglik <- function(p, y, N){
  a <- p[1]
  s <- p[2]
  Q <- 100  # Inf does not work well
  L_i <- function(r,y){
    dbinom(y, size=N, prob=exp(a+r)/(1+exp(a+r)))*dnorm(r,0,s)
  }
  -sum(log(sapply(y, function(x){
    integrate(L_i,lower=-Q, upper=Q, y=x, rel.tol=1e-14)$value
  })))
}

library(lme4)
(model <- glmer(cbind(y,N-y)~1+(1|id), family=binomial))
MLE.glmer <- c(fixef(model), sqrt(VarCorr(model)$id[1]))
opt0 <- optim(MLE.glmer, negloglik, y=y, N=N, control=list(reltol=1e-50,maxit=10000)) 
MLE.optim <- opt0$par
MLE.glmer # MLEs from glmer
MLE.optim # MLEs from optim

L_i <- function(r,y,N,a,s) dbinom(y,size=N,prob=exp(a+r)/(1+exp(a+r)))*dnorm(r,0,s)

L1 <- integrate(L_i,lower=-100, upper=100, y=y[1], N=N[1], a=MLE.glmer[1], 
                s=MLE.glmer[2], rel.tol=1e-10)$value
L2 <- integrate(L_i, lower=-100, upper=100, y=y[2], N=N[2], a=MLE.glmer[1], 
                s=MLE.glmer[2], rel.tol=1e-10)$value

(log(L1)+log(L2)) # loglikelihood (manual computation)
logLik(model)     # loglikelihood from glmer

are the MLEs (not the log-likelihoods themselves) comparable? That is, are you just off by a constant? — Ben Bolker, May 27 '14 at 00:10
The estimated MLEs are clearly different (`MLE.glmer` and `MLE.optim`) especially for the random effect (see the new example), so it is not just based on some constant factor in likelihood values, I think. — quibble, May 27 '14 at 05:46
@Ben Setting a high value of `nAGQ` in `glmer` made the MLEs comparable. The default precision of `glmer` was not very good. — quibble, May 28 '14 at 00:23
Linking to a similar lme4 question that @Steve Walker helped me out with: http://stats.stackexchange.com/questions/77313/why-cant-i-match-lmer-for-family-binomial — Ben Ogorek, Aug 02 '15 at 21:31
As an older question w/ a lot of upvotes, this could probably be grandfathered. I don't see a need for this to be closed. — gung - Reinstate Monica, Nov 25 '16 at 14:23

score 3 · Answer 1 · edited Jul 26 '17 at 15:15

3

Setting a high value of nAGQ in the glmer call made the MLEs from the two methods equivalent. The default precision of glmer was not very good. This settles the issue.

glmer(cbind(y,N-y)~1+(1|id),family=binomial,nAGQ=20)

See @SteveWalker's answer here Why can't I match glmer (family=binomial) output with manual implementation of Gauss-Newton algorithm? for more details.

edited Jul 26 '17 at 15:15

amoeba

93,463
28
275
317

answered May 28 '14 at 00:25

quibble

1,167
10
17

1

But the estimated loglikelihoods are very different (presumably by some constant), so the different methods should not be mixed. – quibble May 28 '14 at 00:35
1

hmm, interesting/surprising -- thanks for setting up this example, I'll try to find time to look into it. – Ben Bolker May 28 '14 at 01:50

Why does glmer not achieve the maximum likelihood (as verified by applying further generic optimization)?

1 Answers1

Linked