Parsimonious model - non convergent model decreases fit when removing random slope with near zero variance

Question

I am trying to use Bates et al (2015) recommendation on reducing a maximal model based on variance explained by the random terms. Importantly, all examples in Bates are models with exclusively categorical variables, not continuous or categorical x continuous interactions. I've found that you can hit a dead-end when your random structure includes categorical x continuous interactions. See code below, which I got after a few iterations of succesfully simplifying the maximal:

max.zcm.4 = lmer(ECG_sd ~ C + A + V+ CA  + Time_sd + C:Time_sd + A:Time_sd + CA:Time_sd+
                         (1+C + A + CA  + Time_sd + C:Time_sd + A:Time_sd ||Subject) + 
                         (1+ C + A +  Time_sd + C:Time_sd || Stim), 
                       data=data, control = lmerControl(optimizer="bobyqa", calc.derivs = FALSE, optCtrl=list(maxfun=2e5)),na.action = "na.exclude")

See random effects info:

 Groups    Name        Variance  Std.Dev.
 Subject   (Intercept) 0.0023927 0.04892 
 Subject.1 C           0.0398474 0.19962 
 Subject.2 A           0.1602602 0.40033 
 Subject.3 CA          0.0112184 0.10592 
 Subject.4 Time_sd     0.0006361 0.02522 
 Subject.5 C:Time_sd   0.0134182 0.11584 
 Subject.6 A:Time_sd   0.0160719 0.12677 
 Stim      (Intercept) 0.0001093 0.01046 
 Stim.1    C           0.0359583 0.18963 
 Stim.2    A           0.0328272 0.18118 
 Stim.3    Time_sd     0.0006784 0.02605 
 Stim.4    C:Time_sd   0.0033931 0.05825 
 Residual              0.9307705 0.96476 
Number of obs: 54668, groups:  Subject, 79; Stim, 24

This model is still non convergent and clearly overparametrized so I need to simplify the random structure. The variance of [Time_sd|Stim] is the lowest, 0.0006784 , followed by [C:Time_sd | Stim] which is 0.0033931. I understand that I first need to remove the interaction. In any case, removing either C:Time_sd, or Time_sd, or both, in all cases it significantly reduces model fit as indicated by the LRT. So I've hit a dead-end: the model won't converge as is, but removing any terms will worsen model fit. I wonder if the issue could have something to do with the continuous*categorical interaction; none of Bates models included that. Could it be that the parsimonious strategy has to be handled differently when it includes time-variant continuous variables?

Reference: Bates, D., Kliegl, R., Vasishth, S. and Baayen, H., 2015. Parsimonious mixed models. arXiv preprint arXiv:1506.04967. https://arxiv.org/pdf/1506.04967.pdf

score 0 · Answer 1 · answered Mar 01 '21 at 15:46

0

Could it be that the parsimonious strategy has to be handled differently when it includes time-variant continuous variables?

I don't think so.

It is natural that the model fit worsens when you simplify the random structure, because the model was overfitted. To reduce overfitting means that the model fit will be worse on this datset - but the good news is that the more parsimonious model will generalise better to new data.

answered Mar 01 '21 at 15:46

Robert Long

53,316
10
84
148

Thanks for the answer! But how do you justify removing slopes that decrease fit, where do you stop? then it'd be easy to end up with random intercept only models, since they give us the most power. – Luminosa Mar 01 '21 at 15:48
You're welcome. You justify removing random slopes in the same way that you would justify removing higher order terms in a linear regression that is overfitted - to reduce or remove the overfitting. You stop when the model is no longer singular, and quite often, if you have insufficient data, this *will* be a model with only random intercepts (and maybe random slopes for main effects). – Robert Long Mar 01 '21 at 15:52
See the following posts for further details: https://stats.stackexchange.com/questions/378939/dealing-with-singular-fit-in-mixed-models/ https://stats.stackexchange.com/questions/509892/why-is-this-linear-mixed-model-singular https://stats.stackexchange.com/questions/449095/how-to-simplify-a-singular-random-structure-when-reported-correlations-are-not-n – Robert Long Mar 01 '21 at 15:53
Thanks, that's a useful resource. There seems to be disagreements between the different 'philosophical' approaches to model selection in MLM. The most purist one is to start from maximal and get the most complex model that the data supports, which can be too conservative, and on the other extreme there's the intercept-only or very few random slopes approach, which is quite liberal and tends towards inflated type I error. We all fall somewhere in between, but there's no consensus on what the right balance is. – Luminosa Mar 01 '21 at 16:02
I defintely don't agree that the most purist approach is to start from maximal model. This arises from the terrible advice in the "Keep it Maximal" paper by Barr and others which is referenced in the Bates paper you cited. There were rarely ever any issues with singular models before Barr. Nowadays every other problem with a mixed model seems to be singularities due to over specified random effects – Robert Long Mar 01 '21 at 16:19
Do you by any chance know of any references to other papers counteraguing the 'Keep it maximal' by Barr, that'd be great, because the Bates one still starts with the maximal model. Are there some 'keep it minimal' MLM model selection papers? – Luminosa Mar 01 '21 at 18:18
I would recommend re-reading the Bates paper. Especially the discussion. They do not advocate starting with the maximal model. I don't like to cherry-pick parts, but: *" In the statistical literature on fitting mixed-effects modeling (see, e.g., Pinheiro and Bates, 2000,Galecki and Burzykowski, 2013,Bates et al., 2015a), the approach taken is one in which variance components are added to the model step by step, typically driven by theoretical considerations.*". I always advocate theoretical considerations. – Robert Long Mar 01 '21 at 18:32

Parsimonious model - non convergent model decreases fit when removing random slope with near zero variance

1 Answers1