3

I have some hierarchical data (roughly 23 observations per individual, 20 individuals per region, and 17 regions in total), and use linear mixed models (LMM) to adjust for the dependencies that come due to the hierarchical nature of the data. I thus run the following model in R:

lmer(Y ~ X1 + X2 + X3 + X4 + X5 + X6 + X7 + X8 + X9 + X10 + X11 + X12 + (1 | region) + (1 + X1 + X2 + X3 + X4 | region:ID), data=MyData)

When I estimated my model with four random slopes and a random intercept, using the command I got the following correlations among my random effects:

Random effects:
 Groups      Name                 Variance Std.Dev. Corr                   
 region:ID   (Intercept)          79.53741 8.9184                          
             X1                    0.30512 0.5524    0.48                  
             X2                    0.06766 0.2601   -0.50 -0.68            
             X3                    0.14973 0.3870   -0.57  0.00 -0.40      
             X4                    0.52897 0.7273   -0.79 -0.28 -0.09  0.95
 region      (Intercept)           4.93030 2.2204                          
 Residual                         16.88022 4.1086                          
Number of obs: 9091, groups:  region:ID, 383; region, 17

Noting that the correlation between the random effects of $X_4$ and $X_3$ is high, I tried also ran the model while forcing independence on the random slopes. That is I ran:

lmer(Y ~ X1 + X2 + X3 + X4 + X5 + X6 + X7 + X8 + X9 + X10 + X11 + X12 + (1 | region) + (1 + X1 + X2 + X3 + X4 || region:ID), data=MyData)

However, once I did this, the random effects structure was changed so that variance of the random effects for $X_1$ and $X_2$ suddenly was zero.

Random effects:
 Groups        Name                 Variance Std.Dev.
 region.ID     X1                    0.2246  0.4739  
 region.ID     X2                    0.0000  0.0000  
 region.ID     X3                    0.0000  0.0000  
 region.ID     X4                    0.3236  0.5689  
 region.ID     (Intercept)          11.4392  3.3822  
 region        (Intercept)           9.3745  3.0618  
 Residual                           18.1717  4.2628  
Number of obs: 9091, groups:  region:ID, 383; region, 17

My question is thus, why can the exclusion correlation parameters lead to non-zero variances becoming zero?

Edit:

$X_1,X_2,X_3$ and $X_4$ are all continuous. However, they are measured on the region level. So for a given year (it's longitudinal data) each individual within the same region has the same value of those variables. So they vary between regions and between years, but not within a region-year.

Phil
  • 627
  • 4
  • 16
  • Are `X1 + X2 + X3 + X4` categorical or continuous? – amoeba Nov 05 '18 at 10:58
  • $X_1, X_2, X_3$ and $X_4$ are all continuous. However, they are measured on the region level. So for a given year (it's longitudinal data) each individual within the same region has the same value of those variables. So they vary between regions and between years, but not within a region-year. – Phil Nov 05 '18 at 11:00
  • I don't get it -- if they are measured on the region level, why do you include random effect of ID on the slope of these variables? I assume ID stands for subject? – amoeba Nov 05 '18 at 11:05
  • Because while they are measured on the region level, it is fair to assume that they all actually are exposed to the same levels of those variables. And they can be affected individually despite the exposure levels being the same. Or am I misunderstanding your question? And yes, ID stands for the subject (which are nested within regions.) – Phil Nov 05 '18 at 11:45
  • For random part, just (1 | region) + (1 |ID), suppose ID is individual ID. – user158565 Nov 05 '18 at 14:54
  • @a_statistician Yes, but that would only give me random intercepts. Nevertheless, since ID is completely nested within region $(1 \mid region) + (1 \mid ID)$ is equivalent to $(1 \mid region) + (1 \mid region:ID)$. – Phil Nov 05 '18 at 15:01
  • The random effects are used to model the covariance matrix of error terms or Y. You can try to derive that covariance matrix to see if it is what you want. Also you can get the covariance matrix when x1-x4 are added, and to see if it is reasonable. – user158565 Nov 05 '18 at 15:07
  • I'm pretty sure that if any X is measured on the region level and you have `(X | region)` included in the model then it does not make any sense to include `(X | region:ID)` into the model. – amoeba Nov 05 '18 at 16:38
  • Ah, wait, your region term only has `(1 | region)`... Then maybe it's okay, but I would still suggest to do `(1 + X1 + X2 + X3 + X4 | region) + (1 | region:ID)` and then see whether changing `|` to `||` will have the same drastic effect. This model makes more sense to me. – amoeba Nov 05 '18 at 16:43
  • Thank you for your feedback, @amoeba. May I ask why that model makes more sense to you? I am not sure I follow. – Phil Nov 06 '18 at 12:31
  • Possibly related https://stats.stackexchange.com/q/115090/164061 – Sextus Empiricus Nov 07 '18 at 14:26

0 Answers0