Convergence in Linear Mixed-Effects Model

Question

I got some convergence warning and tried the recommended way as below:

original <- lmer(Y ~ 1 + X1 + X2 +(1 + X1|group_ID ), data= data) 
summary(original)
ss <- getME(original, c("theta", "fixef"))
restart <- update(original, start=ss, control=lmerControl(optCtrl = list(maxfun=2e4)))
summary(restart)

Then, it seemed to work. I did not get warnings by summary(restart). But, I cannot understand why it worked because I'm not really familiar with optimization methods. Could anyone tell me the reason or available resources to understand this?

Robert Long · Accepted Answer · 2020-09-07T19:25:21.077

Mixed effects models do not have closed form solutions. That is, unlike models such as ordinary least squares regression (where some simple matrix algebra obtains the estimates), it is not possible to perform some simple calculations to find the estimates for the parameters. It is necessary to use an optimizer. An optimizer uses a particular algorithm and iteratively tries to get closer and closer to the solution, starting from some values that it determines at the outset. Once the solution is reached, it stops. There are many different algorithms (and therefore different optimizers) for finding the solutions to different types of problems

In mixed models, the function that is being optimised (the objective function) is extremely complex, and can take thousands of steps to find a solution - if indeed a solution exists. The optimizer does not go on forever. If it does not find a solution after a certain number of iterations, it stops, and gives the kind of warning that you obtained. If a solution exists, then by increasing the number if iterations, the solution can often be reached. However, it starts from the same point (same start values) and sometimes this requires a lot of time, so rather than start from the beginning (with the same start values), a good approach is to restart it from the values it had previous reached when it didn't converge. This should take less time. This is what the technique you used does.

Edit: to address the point in comments that increasing the number of iterations 10 fold did not solve the convergence problem, but restarting with current values did. This can happen if, with the default starting values, the optimizer is not converging to a solution at all, or something has "gone wrong" with the initial optimization run, such as using an inapproprate step size. Restarting at current values is not necessarily the same thing as just continuing from where it left off previously. This will depend on the algorithm used, but other aspects of the optimization apart from just the current values, such as step size, may depend on the recent history of steps. So, by restarting at the previous values, it may "reset" the algorithm in a way which sends it towards the true solution.

Another situation can arise where restarting the optimization actually takes more steps in total than just letting the initial run continue. Basically, it's the same logic as in the previous paragraph but reversed. In this case the initial optimization is converging to the solution, but it had not run for long enough, and by restarting at the current values the previous state of the algorithm was lost and it takes some further iterations to recover it's state and find the solution.

The above is delibrately general. I can't be specific because I am not familiar with the internals of diffent optimizers.

It is also worth noting that in some complex mixed models the objective function may have local maxima apart from the global maxima that we want to find. Sometimes the algorithm will converge to the local maxima. Another possibility is that the function is very flat in a certain region which can cause some numerical problems. Another (fairly unusual) problem is that due to some peculiarity in the objective function's behaviour at particular region, the optimizer can get stuck and keep returning to the same point over and over.

Note that in your example, you should use maxeval and not maxfun. maxeval is used by the nloptwrap optimizer (the default for lmer), while maxfun is used by the bobyqa and Nelder_Mead optimizers (used by glmer).

Thank you for your kind comments. I understand why mixed effects models require optimization algorithm, but I have further two questions. 1) I tried increasing iteration (without restarting) from 20,000 to 200,000, but it didn't converge though the number of iteration itself was larger than in the case of running optimizer two times as I wrote. What is the reason? 2) If restarting once did not work, is it reasonable to restart twice? I tried this, and actually it worked. The grad was getting closer to zero by restarting several times. — k m, Sep 07 '20 at 17:35
1) did you use the correct parameter (`maxeval` and not `maxfun`). 2) yes, there is no problem with restarting it again. — Robert Long, Sep 07 '20 at 17:39
1) Yes, I tried with `maxeval` after I saw your comments, but still the same thing occurred. 2) That's great! Thank you! — k m, Sep 07 '20 at 17:48
OK, well the number of iterations will depend on the behaviour of the function being optimised, which is unpredictable (that's why we need an optimizer in the first place), so there is no way to be sure what is going on. Did it take 10 times longer when you increased the number of iterations ? — Robert Long, Sep 07 '20 at 18:26
Thank you for adding the comment! The computation time was only a few seconds in both cases, so I guess it did not take 10 times longer. I’m afraid I might write some miscoding if this thing rarely happen... But anyway, I appreciate your quick and insightful comments. — k m, Sep 07 '20 at 22:05

Convergence in Linear Mixed-Effects Model

1 Answers1

Linked