1

I fit a mixed-effects logistic regression model in R with the following formula:

glmer.traditional <- glmer(AGENT.EXPONENCE ~ ASPECT + (1 | LEMMA), data = hdtpassive, family = binomial(link="logit"))

The standard deviation for the random intercept is really high:

Random effects:
Groups Name        Variance Std.Dev.
LEMMA  (Intercept) 400.4    20.01   
Number of obs: 438, groups:  LEMMA, 174

When, however, I use the following formula, the standard deviation plummets:

glmer.traditional <- glmer(AGENT.EXPONENCE ~ ASPECT + (1 | LEMMA), data = hdtpassive, family = binomial(link="logit"), control = glmerControl(optimizer = "bobyqa"), nAGQ = 25)

Random effects:
Groups Name        Variance Std.Dev.
LEMMA  (Intercept) 27.28    5.223   
Number of obs: 438, groups:  LEMMA, 174

The nAGQ is the scalar that is used for approximating the log-likelihood. Higher values for this argument produce more accurate approximations, but come at the expense of speed.

I have two questions about this:

  1. How does the value of the integer scalar affect the standard deviation of the random intercept? I don't know how the Gauss-Hermite quadrature works.

  2. Are there guidelines on the interpretation of standard deviations for random intercepts? E.g., is a really high standard deviation a warning sign of some kind?

Namenlos
  • 375
  • 2
  • 8
  • A bit of (unsolicited) advice: the way this question is written, it appears at first glance to be highly language-specific and basically a programming question, which would be [off-topic here](https://stats.stackexchange.com/help/on-topic). You can improve both the amount of traffic to your question as well as the probability that it remains open (not closed by mods) by editing the question to de-emphasize the parts that are specific to R and the lme4 package, and emphasize instead the general statistical parts of the question. _Minimally_, you would need to explain what `nAGQ` is. – Jake Westfall Aug 01 '18 at 22:32
  • 1
    Thank you for your advice @JakeWestfall. I edited my question. – Namenlos Aug 01 '18 at 22:52

1 Answers1

0

This answer is more of a jumping-off point for those more experienced than I. I just did a bit of preliminary research, for those who may be interested. To answer question (1), by pulling from the glmer documentation...

nAGQ: integer scalar - the number of points per axis for evaluating the adaptive Gauss-Hermite approximation to the log-likelihood. Defaults to 1, corresponding to the Laplace approximation. Values greater than 1 produce greater accuracy in the evaluation of the log-likelihood at the expense of speed. A value of zero uses a faster but less exact form of parameter estimation for GLMMs by optimizing the random effects and the fixed-effects coefficients in the penalized iteratively reweighted least squares step.

So, increasing nAGQ will (i) take longer, and (ii) increase accuracy of the log-likelihood evaluation.

An answer to part (2) can be seen in this thread. An explanation of the intercept term itself is seen here. In the comments it is noted that:

"If all the variables, both predictors and response, are centered, then you don't need the intercept term. Instead, you take away 1 df for residual because of centering the response variable. Once you have done all that, it is equivalent to including the intercept in the model where the variables are not centered."

ERT
  • 1,265
  • 3
  • 15