lme4: what is gained by scaling variables?

Question

I am performing a multilevel logistic regression with glmer from the lme4-package. Currently, it's a very simple model of the form: glmer(y ~ x + (1 | group), data = data, family = binomial). Running this in R returns the model fit accompanied by a warning message: "Model is nearly unidentifiable: very large eigenvalue".

Therefore, I centered the variable x by using scale(x), as was mentioned for example here. This indeed lets the warning message disappear. Hooray!

However, the output of the model is actually almost the same as before. The only things that changed are the estimate of the coefficient for y and its standard error, which is not surprising at all, the intercept and the correlation between the intercept and y. Everything else is the same as before scaling, e.g. the estimate of the random effects, the p-value of the coefficient of y, the AIC, BIC, deviance.

So, basically, all parameters that I am interested in are the same, regardless whether I center my variables or not. One thing that changes for sure, is the coefficient of my centered variable. However, this changes not in a good way, since I can't properly interpret the coefficient of a centered variable (or at least not as properly as the coefficient of the original variable).

Thus, my questions are:

Is my first model with the non-centered variable necessarily problematic?
If the answer to the first questions is "yes", is my second model with the centered variable really non-problematic, since it gives essentially the same output as the other model?
Should I do more to improve my models?

score 4 · Accepted Answer · answered Feb 05 '20 at 22:18

You mention that you cannot properly interpret the coefficient of a standardized variable, and I would disagree. Imagine you have a variable that is scaled from 0 to 100. A 1-unit difference in such a variable might not represent a very meaningful quantity. By using scale(x) you standardize that variable relative to a normal distribution - from each person's/observation's value of $x$, you subtract $\bar{x}$ and divide by sd($x$). Recall that regression coefficients are interpreted as the association between a 1-unit increase/decrease in x and the outcome (here the coefficient is either in log odds units or odds ratios). By standardizing x, a 1-unit increase in x is equivalent to 1 standard deviation increase in x. A lot of people like that interpretation.

Regarding your question:

Is my first model with the non-centered variable necessarily problematic?

Only insofar is that you are getting warnings about reaching the limits of identification for the model. This is a somewhat common problem people encounter with glmer model fits. See this CV post for some guidelines. The linked thread deals with your other questions as well. Bottom, line you can increase the number of iterations, try using different optimizers, and check for singularity.

You can also try fitting your model with a different package, such as GLMMadaptive to check if the estimates are similar and if you get similar warnings.

Thanks a lot for this answer! Regarding the interpretation of standardized coefficients: I agree with you in general. Particularly if scales of variables are rather arbitrary, standardized coefficients are actually *better* to interpret. But to add a bit background: in my case, the variables have very specific meanings, e.g. the size of a tumor in mm3. The surgeon really wants to know, what is the effect of one additional mm3, they cannot work with the standardized interpretation. — LuckyPal, Feb 06 '20 at 11:36
Maybe can you add some details on my second question, i.e. should I prefer the version with centered variables, because there is no reported problem with model identification, although all relevant estimates did not change? If the new model is correct, and the old model has exactly the same outputs as the new model, I tend to assume that the old model is also correct (or, more likely, that the new model is not correct, neither, even though there was no warning message). If you can clear up my confusion in this matter, I am happy to accept it as an answer. — LuckyPal, Feb 06 '20 at 11:41
I see about the original meaning of your variable. You are probably ok with the non-centered version, however I would run the model in a second (and third) package to feel confident about the results. Hence I mentioned GLMMadaptive. You could also use glmmTMB. See https://cran.r-project.org/web/packages/glmmTMB/vignettes/glmmTMB.pdf — Erik Ruzek, Feb 06 '20 at 12:53

lme4: what is gained by scaling variables?

1 Answers1