0

I am running a multilevel logistic regression using the glmer function from the lme4 package. I would like to rescale the variances using Hox (2010) method and for that I need to calculate the variance of the linear predictor but not sure how to do this in R.

Some context first, a thought experiment may help here. Imagine a null two level random intercepts model with no predictors, and which the level 2 variance is 0.35 (a VPC of 0.10). Now introduce an important level-1 predictor that has no strong level 2 component; it is ‘pure’ level 1 variable. As it is an important variable it should reduce the level 1 variance and leave the level 2 variance unchanged. But the level 2 variance is really scaled to the level 1 variance, the latter has gone down, but it cannot do so as it is fixed to 3.29. The consequence is that the level-2 variance will appear to go up to keep the relative scaling with the level 1 variance. Of the variance that remains a larger percentage must be at the higher level, as the level 1 value is fixed at 3.29. Sometimes this apparent increase is quite considerable if the level 1 predictor is an important one. In reality the matter is further complicated in that there may be an element of the level 1 variable that varies at the area level, and this might be reducing the level 2 variance but this is not showing as it is being swamped by the rise consequent on the explanatory power of the pure level-1 component of the variable. There is a final and important twist. As the level 2 variance increases the cluster-specific multilevel estimates can be expected to increase in absolute value; so that these constraints affect the fixed part estimates as well as the random part. In short, in generalised linear models, changes in estimates are in part substantive and in part a technical consequence of scaling to the unchangeable level 1 variance.

Hox proposes to rescale the effects of a model via a factor which is the total variance of the null model (intercept only, composed of of variance at level 1 which is constant at 3.29 and variance at level 2) divided by the total variance in the model with predictors at level 1 (composed of variance at level 1 which is constant at 3.29, variance at level 2, and variance of the linear predictor). The linear predictor can be calculated from predictions of the regression equation, e.g. z = -342 - 0.003*X1 - 0.02*X2 + 051*X3 and then I would need to obtain its variance.

R-Rex
  • 25
  • 4
  • It would help if you could provide a more complete reference to Hox (2010) and explain how the "variance of the linear predictor" will be used. It's also unclear how random effects would be incorporated. There is a [`predict()`](https://stats.stackexchange.com/a/174227/28500) function in R for merMod objects as generated by `glmer()`, and a `vcov()` function that returns the variance-covariance matrix of the fixed effects, if that helps. Please edit your question to provide more details so that you can get a useful answer. – EdM Apr 15 '19 at 19:10
  • Thanks, edited with some more details now! – R-Rex Apr 16 '19 at 10:43

1 Answers1

1

This issue arises when one wishes to estimate the proportion of variance explained by different parts of a mixed logistic model.

Unlike standard linear models, there is a fundamental, fixed contribution to the total variance of the underlying latent variable in logistic regression, arising from the assumption of binomial sampling with logit link, of $\pi^2/3$ (approximately 3.29, the value used by the OP). This is a constant regardless of how many other contributions to the variance are being considered. Thus the overall scale of the variance changes at different levels of modeling, due to this fixed contribution to the variance, and there is ambiguity in what is meant by the proportion of variance explained.

Hox et al* recommend scaling regression coefficients and variance components to overcome this problem. Quoting from page 137 (with $\sigma^2_F$ the "level 1 variance" from the fixed part of the model, $\sigma^2_{u0}$ the "level 2 variance" from random intercepts, and $\sigma^2_R$ the fixed variance from the logit link):

For the null model, the total variance is $\sigma^2_0 = \sigma^2_{u0} + \sigma^2_R$, with $\sigma^2_R$ ≈ 3.29. For the model $m$ including the first-level predictor variables, the total variance is $\sigma^2_m = \sigma^2_F+ \sigma^2_{u0} + \sigma^2_R$. Hence, we can calculate a scale correction factor that rescales the model m to the same underlying scale as the null model. The scale correction factor equals $\sigma_0/\sigma_m$ for the regression coefficients and $\sigma^2_0/\sigma^2_m$ for the variance components. Next, we can rescale both the regression coefficients and the variance estimates $\sigma^2_{u0}$ and $\sigma^2_R$ by the appropriate scale correction factor, which makes them comparable across models. The scale corrected variance estimates are useful for assessing the amount of variance explained separately at the different levels...

In this context, the value of $\sigma^2_F$ would simply be the variance (on the logit scale) among the predicted values for the modeled data set based on the fixed effects alone, with random effects set to 0.

In R, all you should need to do to get those predicted values is to use the predict function on the object returned by glmer together with the original data set, specifying that random effects are to be omitted (set re.form=NA). See the help page for "predict.merMod" in lme4


*Hox, Joop J., Moerbeek, Mirjam and van de Schoot, Rens. Multilevel Analysis : Techniques and Applications, Second Edition, Taylor & Francis Group, 2010.

EdM
  • 57,766
  • 7
  • 66
  • 187