5

I'm following up on this great answer regarding running Principal Component Analysis (PCA) to uncover the reason behind lack of convergence and/or singularity for Mixed-Effects models.

My model below doesn't convergence, however, I wonder why I don't get a convergence when model is not singular?

NOTE: When we drop the correlation between intercepts and slope the model below becomes singular! i.e., lmer(math ~ ses*sector + (ses || sch.id), data = dat)

library(lme4)

dat <- read.csv('https://raw.githubusercontent.com/rnorouzian/e/master/nc.csv')

m4 <- lmer(math ~ ses*sector + (ses | sch.id), data = dat)

summary(m4)
Random effects:
 Groups   Name        Variance Std.Dev. Corr
 sch.id   (Intercept)  3.6302  1.9053       
          ses          0.7356  0.8577   0.46
 Residual             10.1951  3.1930       
Number of obs: 7185, groups:  sch.id, 160


summary(rePCA(m4))
$sch.id
Importance of components:
                         [,1]   [,2]
Standard deviation     0.6118 0.2321
Proportion of Variance 0.8742 0.1258
Cumulative Proportion  0.8742 1.0000
rnorouzian
  • 3,056
  • 2
  • 16
  • 40

1 Answers1

4

It is importatnt to understand that non-convergence and singular fit are not the same thing at all. Indeed, when you obtain a singular fit, the model has actually converged, whereas when you obtain a non-convergence warning, the model has not converged. Running principal components analysis on the variance covariance matrix of random effects for a model that has not converged is not going to help, unless the underlying problem was an overfitted random effects structure and the optimizer had happened so stop at a location where this could be identified. This seems unlikely in practice.

The optimizer will stop and generate a warning when certain conditions are met. Sometimes it is possible to change the parameters in the optimiser, such as increasing the number of iterations, changing the algorithm, changing the tolerance level etc, and it will converge. Sometimes changing the starting values for the random effects will work, and sometimes it is simply not possible to find a solution.

In this particular case, restarting with current random effects estimates works:

> m4 <- lmer(math ~ ses*sector + (ses | sch.id), data = dat)
Warning message:
In checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv,  :
  Model failed to converge with max|grad| = 0.00279006 (tol = 0.002, component 1)
> theta <- getME(m4, "theta")
> m5 <- update(m4, start = theta)
Robert Long
  • 53,316
  • 10
  • 84
  • 148
  • 1
    Thanks Rob! But when we drop the correlation between intercepts and slope the model below becomes singular! i.e., `lmer(math ~ ses*sector + (ses || sch.id), data = dat)` – rnorouzian Sep 29 '20 at 15:04
  • 1
    That does seem strange, but these are complicated models. Note that the fixed effect inference (point estimates and standard errors) in the singular model is pretty much the same as the non-singular model, and if you remove the random slopes altogether, the inference is still the same. The model without random slopes also has a slightly smaller AIC, so I would probably remove the random slopes from the model. I would also look at a plit of the data to see whethere there appears to be much variation in slopes between schools. – Robert Long Sep 29 '20 at 15:59
  • Rob, I can tell you now what's strange about it! The model only contains level-2 predictors (i.e., `ses` and `sector`). This model is fundamental incorrect from a model-leveling perspective, right? – rnorouzian Oct 03 '20 at 05:08
  • If `ses` is average socio-economic status for each school, then random slopes means that each school has it's own slope for `ses` ? Does that make sense ? – Robert Long Oct 03 '20 at 08:45
  • Rob, `ses` is NOT average socio-economic status for each school. I also contacted another HLM expert. Here is what he said: *"we cannot put level 2 predictors in the random part. Because the level 2 predictor is constant within the cluster, while the HLM random part is the analysis about how the cluster affects predictors. If it's constant that means the cluster has no effect."* – rnorouzian Oct 03 '20 at 15:20
  • I know no idea what `ses` is. I was just using it as an example of something that would be constant at the school level. I am literally saying exactly what you have just quoted: When I said *"Does that make sense ? "* it is a rhetorical question - of course it does not make sense, because it's constant. – Robert Long Oct 03 '20 at 15:56
  • `blme` is not fully Bayesian iirc, whereas `stan` definitely is, so that could be the reason. – Robert Long Oct 03 '20 at 16:59