I am trying to get a deeper understanding of failures to converge in multilevel models that I estimate with lmer()
. "Failure to converge" is vague; I want to be able to specify the problem that underpins these failures, and to express it numerically in terms of the data that I'm using. I am starting with little knowledge of the math that underpins estimation of these models.
Consider the following toy example:
library(lme4)
set.seed(1234)
person <- factor(rep(c("Alice", "Bob", "Catherine", "David"), each = 3))
female <- rep(1:0, each = 3)
myDF <- data.frame(
y = c(1:6, 10:12, 1:3),
person = person,
female = female)
lmer(y ~ (1 + female | person), data = myDF)
Running that example generates these warnings:
Warning messages:
1: In checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, :
unable to evaluate scaled gradient
2: In checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, :
Model failed to converge: degenerate Hessian with 1 negative eigenvalues
Although I've kept the dataset small (n = 12), the convergence warnings don't seem to be due to the small sample. There are more observations than parameters to estimate, and with a small tweak, I can generate the same warnings when (say) n = 1200, while holding constant the number of parameters.
That said, I have only a superficial understanding of what the warnings mean. Can the problems be expressed in terms of the data used in this example -- in terms of the values of y
, the values of female
, and so on?
There are many posts here about failures to converge when using lmer()
. They are useful, but they generally emphasize programming strategies –– use a different optimizer, etc. –– rather than the intuition behind those strategies. Some posts are helpful for developing intuition: for example, "Model failed to converge" warning in lmer() and this non-StackExchange post. But I fear that even those posts don't help me to understand the problem above. I haven't found any sources that work through simple numerical examples like this one.