2

I am trying to get a deeper understanding of failures to converge in multilevel models that I estimate with lmer(). "Failure to converge" is vague; I want to be able to specify the problem that underpins these failures, and to express it numerically in terms of the data that I'm using. I am starting with little knowledge of the math that underpins estimation of these models.

Consider the following toy example:

library(lme4)
set.seed(1234)
person <- factor(rep(c("Alice", "Bob", "Catherine", "David"), each = 3))
female <- rep(1:0, each = 3)
myDF <- data.frame(
  y      = c(1:6, 10:12, 1:3),
  person = person,
  female = female)
lmer(y ~ (1 + female | person), data = myDF) 

Running that example generates these warnings:

Warning messages:
1: In checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv,  :
  unable to evaluate scaled gradient
2: In checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv,  :
  Model failed to converge: degenerate  Hessian with 1 negative eigenvalues

Although I've kept the dataset small (n = 12), the convergence warnings don't seem to be due to the small sample. There are more observations than parameters to estimate, and with a small tweak, I can generate the same warnings when (say) n = 1200, while holding constant the number of parameters.

That said, I have only a superficial understanding of what the warnings mean. Can the problems be expressed in terms of the data used in this example -- in terms of the values of y, the values of female, and so on?

There are many posts here about failures to converge when using lmer(). They are useful, but they generally emphasize programming strategies –– use a different optimizer, etc. –– rather than the intuition behind those strategies. Some posts are helpful for developing intuition: for example, "Model failed to converge" warning in lmer() and this non-StackExchange post. But I fear that even those posts don't help me to understand the problem above. I haven't found any sources that work through simple numerical examples like this one.

user697473
  • 638
  • 1
  • 4
  • 11
  • 2
    You are trying to fit a random slope (female) and intercept for each individual. So you have at least two parameters to estimate per subject, you also have the variance components that you need to estimate and the fixed effects on top of that. Anyways, even if that is not the issue, I'm not sure treating female as a random slope makes sense given it is discrete. (How can you estimate a slope for female within each subject if each subject only has one value?) If I make it a fixed effect, `lmer(y ~ female + (1 | person), data = myDF) `, the warning goes away. – Emma Jean Jun 03 '20 at 16:00
  • Thank you, @EmmaJean. You write "I'm not sure treating female as a random slope makes sense given it is discrete. (How can you estimate a slope for female within each subject if each subject only has one value?)," and I don't have a direct answer. I do note, though, that the computational issue disappears if I keep the `female` predictor of the group-level intercepts while also adding a `female` fixed effect. So the problem that I mentioned in my post doesn't seem due to including the `female` predictor at the group level. (Whether it's sensible to include that predictor is another question.) – user697473 Jun 04 '20 at 11:45

1 Answers1

4

Building on @Emma Jean's response, Your created data frame is not multilevel in the sense that person (the grouping variable) and female are at the same level. In order for your data to accommodate a random slope, you would need to measure a variable that varied within person. It could be anything - mood or cortisol or pocket change - as long as it is measured repeatedly. So although you measure each of your persons on multiple occasions, person Alice always has a value of 1 on female, Bob always has a 0, Catherine always has 1, and David always a 0. In a multilevel model with such data, you can model the association between female and each person's y-intercept (essentially an average of their three observed y values). Below is your code with the addition of a time-varying variable (mood):

library(lme4)
set.seed(1234)
person <- factor(rep(c("Alice", "Bob", "Catherine", "David"), each = 3))
female <- rep(1:0, each = 3)myDF <- data.frame(
  y      = c(1:6, 10:12, 1:3),
  person = person,
  female = female,
  mood = c(4.5, 3.5, 4, 2.1, 3, 3.5, 1.6, 2.8, 4.3, 4.6, 4.2, 3.9))

If you look at the data frame, you will see that mood varies within persons, so it is time-varying. In the multilevel framework, such time-varying variables can be investigated as having associations with the outcome that are allowed to vary across groups (i.e., persons). I did not create the mood variable with careful consideration of building such unique associations, and not surprisingly, none were found and I get convergence warnings when trying to estimate the random slope for mood.

Another important feature of time-varying variables, is that you can calculate summary statistics on them that then become group (person) level variables. So with time-varying mood:

myDF$pmn.mood <- with(myDF, ave(mood, person, FUN=mean))
str(myDF)
'data.frame': 12 obs. of  5 variables:
 $ y       : int  1 2 3 4 5 6 10 11 12 1 ...
 $ person  : Factor w/ 4 levels "Alice","Bob",..: 1 1 1 2 2 2 3 3 3 4 ...
 $ female  : int  1 1 1 0 0 0 1 1 1 0 ...
 $ mood    : num  4.5 3.5 4 2.1 3 3.5 1.6 2.8 4.3 4.6 ...
 $ pmn.mood: num  4 4 4 2.87 2.87 ...

These summary variables can be included, along with their time-varying analog, in your multilevel model as a person-level predictor of the random intercept:

m2 <- lmer(y ~ female + mood + pmn.mood + (1 | person), data = myDF) 
Linear mixed model fit by REML ['lmerMod']
Formula: y ~ female + mood + pmn.mood + (1 | person)
   Data: myDF

REML criterion at convergence: 32.6

Scaled residuals: 
     Min       1Q   Median       3Q      Max 
-1.36975 -0.36411  0.08105  0.39417  1.26723 

Random effects:
 Groups   Name        Variance Std.Dev.
 person   (Intercept) 12.8519  3.5850  
 Residual              0.9209  0.9596  
Number of obs: 12, groups:  person, 4

Fixed effects:
            Estimate Std. Error t value
(Intercept)  19.6480    10.6931   1.837
female        2.5451     3.6393   0.699
mood          0.5357     0.4125   1.299
pmn.mood     -5.0845     2.9531  -1.722

Correlation of Fixed Effects:
         (Intr) female mood  
female   -0.247              
mood      0.000  0.000       
pmn.mood -0.961  0.080 -0.140

If I calculated a person mean for female, the value for the person mean variable would be the same as their value on female and would not be permissible as a predictor in a model that also had female.

Erik Ruzek
  • 3,297
  • 10
  • 18
  • Thank you, @ErikRuzek. This helps. / I would like to take up your claim that "In order for your data to accommodate a random slope, you would need to measure a variable that varied within person." A result from `lmer()` seems to suggest that this isn't so. Specifically, if I estimate `lmer(y ~ female + (1 + female | person), data = myDF)`, `lmer()` returns the results without complaint. How can your claim be true, given that `lmer()` has no problem estimating the model? / I agree that the model may not be sensible, but for the moment, my focus is on the computational issues. – user697473 Jun 05 '20 at 12:25
  • The correlation between the random intercept and slope from the model you propose is 1. That suggests that they are indistinguishable from one another. So while that model can be estimated using software, that model neither makes conceptual sense nor statistical sense. Garbage in, garbage out. – Erik Ruzek Jun 05 '20 at 16:54
  • Thank you, Erik. I agree that the model doesn't make sense. As I see it, the problem is simply -- as you and @EmmaJean had it before -- that `female` is entered at the group level but doesn't vary within groups. / Perhaps it's worth noting that we can estimate these bad models without getting the perfect correlation between random intercepts and random slopes that you noticed. Just drop the random intercepts: you can estimate `lmer(y ~ (female -1 | person), data = myDF)`, but I can't see that it makes any more sense than the other models discussed here. – user697473 Jun 06 '20 at 13:49