0

When fitting a mixed effects model, how can I tell which variable is more meaningful (or important)?

This is a follow-up question on how to tell which variable is more meaningful when modeling the relationship between several predictors and outcome variable. Whereas the linked question dealt with linear models fitted with lm() function in R, the strategies provided there don't work with mixed effect models (e.g., fitted with lmer() from {lme4}).

Let's consider a minimal reproducible example in R:

library(lme4)
#> Loading required package: Matrix

my_model <- lmer(weight ~ Time * Diet + (1 | Chick), data = ChickWeight)
summary(my_model)

which gives

#> Linear mixed model fit by REML ['lmerMod']
#> Formula: weight ~ Time * Diet + (1 | Chick)
#>    Data: ChickWeight
#> 
#> REML criterion at convergence: 5466.9
#> 
#> Scaled residuals: 
#>     Min      1Q  Median      3Q     Max 
#> -3.3158 -0.5900 -0.0693  0.5361  3.6024 
#> 
#> Random effects:
#>  Groups   Name        Variance Std.Dev.
#>  Chick    (Intercept) 545.7    23.36   
#>  Residual             643.3    25.36   
#> Number of obs: 578, groups:  Chick, 50
#> 
#> Fixed effects:
#>             Estimate Std. Error t value
#> (Intercept)  31.5143     6.1163   5.152
#> Time          6.7115     0.2584  25.976
#> Diet2        -2.8807    10.5479  -0.273
#> Diet3       -13.2640    10.5479  -1.258
#> Diet4        -0.4016    10.5565  -0.038
#> Time:Diet2    1.8977     0.4284   4.430
#> Time:Diet3    4.7114     0.4284  10.998
#> Time:Diet4    2.9506     0.4340   6.799
#> 
#> Correlation of Fixed Effects:
#>            (Intr) Time   Diet2  Diet3  Diet4  Tm:Dt2 Tm:Dt3
#> Time       -0.426                                          
#> Diet2      -0.580  0.247                                   
#> Diet3      -0.580  0.247  0.336                            
#> Diet4      -0.579  0.247  0.336  0.336                     
#> Time:Diet2  0.257 -0.603 -0.431 -0.149 -0.149              
#> Time:Diet3  0.257 -0.603 -0.149 -0.431 -0.149  0.364       
#> Time:Diet4  0.254 -0.595 -0.147 -0.147 -0.432  0.359  0.359

Although I included both Time and Diet (and their interaction) as predictors, I still want to know which of them is more meaningful in terms of explaining the outcome variable weight. In other words, if someone were to force me to use only one predictor in my model, which one should I choose?

Emman
  • 177
  • 1
  • 10

1 Answers1

1

A similar approach to those in the linked question can be used. Here is an example using the domir package.

> library(domir)
> library(MuMIn)

> domin(weight ~ 1, 
  lmer, 
  list(\(x) list(R2m = MuMIn::r.squaredGLMM(x)[[1]]), "R2m"), 
  data = ChickWeight, 
  sets = list("Time + Time:Diet", "Diet + Time:Diet"), 
  consmodel = "(1 | Chick)")

Overall Fit Statistic:      0.765202 
Constant Model Fit Statistic:  0 

General Dominance Statistics:
     General Dominance Standardized Ranks
set1         0.3859209    0.5043386     1
set2         0.3792811    0.4956614     2

Conditional Dominance Statistics:
        IVs: 1        IVs: 2
set1 0.7718418 -1.838085e-12
set2 0.7652020 -6.639834e-03

Complete Dominance Designations:
             Dmnated?set1 Dmnated?set2
Dmnates?set1           NA         TRUE
Dmnates?set2        FALSE           NA

Components of sets:
set1 : Time + Time:Diet 
set2 : Diet + Time:Diet 

The effect of Time slightly edges out Diet - however, they are difficult to separate given the interaction and really probably shouldn't be evaluated in this fashion as the interaction is a component of each and biased without both first order terms in the model (which is why the "IVs: 2" results in the "Conditional Dominance Statistics" matrix are negative).

A better idea is to remove their interaction to clearly distinguish between the two. If the interaction is not estimated, their effects are much clearer:

> domin(weight ~ Time + Diet, 
  lmer, 
  list(\(x) list(R2m = MuMIn::r.squaredGLMM(x)[[1]]), "R2m"), 
  data = ChickWeight, 
  consmodel = "(1 | Chick)")

Overall Fit Statistic:      0.7382927 
Constant Model Fit Statistic:  0 

General Dominance Statistics:
     General Dominance Standardized Ranks
Time        0.68952943   0.93395131     1
Diet        0.04876327   0.06604869     2

Conditional Dominance Statistics:
        IVs: 1     IVs: 2
Time 0.6962558 0.68280310
Diet 0.0554896 0.04203694

Complete Dominance Designations:
             Dmnated?Time Dmnated?Diet
Dmnates?Time           NA         TRUE
Dmnates?Diet        FALSE           NA

This much more clearly favors Time and only loses a bit of predictive utility. The interaction can be built-in as a separate predictor if desired using LeBreton, Tonidandel, and Krasikova's (2013) residualization method, most likely with a separate residual for each included level of Diet.

jluchman
  • 476
  • 2
  • 11