0

I'm a little confused as to how to interpret the p-values in the model below, under "Fixed effects." I had two fixed effects (group and condition), group with three levels and condition with two. The three groups are EN, HS, and SB, and the two conditions are EN-GJT-R-GAP and EN-GJT-R-RES.

Under "Fixed Effects," I'm not sure what (Intercept) refers to? I thought the estimate referred to a difference between each parameter and the first one alphabetically (the one listed as (Intercept)), but here I have two different fixed effects, so does (Intercept) refer to groupEN or conditionEN-GJT-R-GAP? How could it simultaneously refer to both? And what do the estimates for groupHS, groupSB, and conditionEN-GJT-R-RES refer to? Also, what does the p-value for (Intercept) refer to? I thought that for the other ones (not (Intercept)), the p-values indicate the statistical significance of each parameter vis-à-vis the (Intercept), but again, how can this work if there are two different parameters the (Intercept) could refer to, and what does the p-value of the (Intercept) refer to if it's not being compared to anything else? Clearly I'm missing a lot of things, so any help would be vastly appreciated!

Linear mixed model fit by REML. t-tests use Satterthwaite's method ['lmerModLmerTest']
Formula: score ~ group + condition + (1 | subject) + (1 | token_set) +      (1 | list)
   Data: EN_JT_1

REML criterion at convergence: 521.2

Scaled residuals: 
    Min      1Q  Median      3Q     Max 
-3.4748 -0.3124  0.2425  0.6686  1.8308 

Random effects:
 Groups    Name        Variance  Std.Dev. 
 subject   (Intercept) 2.170e-02 1.473e-01
 token_set (Intercept) 3.147e-03 5.610e-02
 list      (Intercept) 1.319e-10 1.148e-05
 Residual              9.288e-02 3.048e-01
Number of obs: 852, groups:  subject, 71; token_set, 24; list, 2

Fixed effects:
                      Estimate Std. Error       df t value Pr(>|t|)    
(Intercept)            0.99723    0.03554 70.75609  28.056  < 2e-16 ***
groupHS               -0.11226    0.04723 67.77282  -2.377   0.0203 *  
groupSB                0.04257    0.05227 67.77205   0.814   0.4182    
conditionEN-GJT-R-RES -0.27753    0.03099 21.38884  -8.955  1.1e-08 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Correlation of Fixed Effects:
            (Intr) gropHS gropSB
groupHS     -0.531              
groupSB     -0.480  0.361       
cEN-GJT-R-R -0.436  0.000  0.000
optimizer (nloptwrap) convergence code: 0 (OK)
boundary (singular) fit: see ?isSingular
kjetil b halvorsen
  • 63,378
  • 26
  • 142
  • 467
  • With default contrasts, the intercept is the estimated mean of that group where all categorical variables equal their reference level (the first level). – Roland Dec 27 '20 at 10:27
  • Thank you! I have two different fixed factors here, "group" and "condition," so there are two different reference levels / first levels. Which one does the intercept refer to? That's the source of my confusion. And are you able to clarify the p-values? Thanks so much! –  Dec 27 '20 at 10:37
  • The intercept refers to both. If you have categories "car type" and "color", you can have a blue bus as the intercept. – Roland Dec 27 '20 at 12:23

1 Answers1

2

At the bottom of your model summary, if you look carefully, you will see the dreaded warning from R:

boundary (singular) fit: see ?isSingular

As explained by Robert Long at Dealing with singular fit in mixed models:

"When you obtain a singular fit, this is often indicating that the model is overfitted – that is, the random effects structure is too complex to be supported by the data, which naturally leads to the advice to remove the most complex part of the random effects structure (usually random slopes). The benefit of this approach is that it leads to a more parsimonious model that is not over-fitted."

Thus, your model needs some refining before you reach the interpretation stage. For example, the variance for the random effect of list seems really, really small - do you really need to include a random effect for list in your model?

Assuming for pedagogical purposes your current model is sensible (though we know it's not due to the singular fit warning!), Roland already explained in his comments that R uses dummy coding to encode the effects of categorical variables in a mixed effects model such as yours. In your case, this means that R will include two dummy variables for group in your model (since group has 3 categories) and one dummy variable for condition (since condition has 2 categories). In general, if you have a variable with k categories, R will capture its effect on the response variable by including k-1 dummy variables in the model in lieu of that variable.

How does R create the k-1 dummy variables for a categorical variable with k categories? It orders the k categories by name in alphabetical order and sets aside the first category as the reference category against which all others will be compared. Then, it defines k-1 dummy variables for the remaining non-reference categories and includes them in the model.

You can see this in action for your group variable. This variable has 3 categories: EN, HS and SB. R arranges these categories in alphabetical order (i.e., EN, HS and SB) and sets aside the first category, EN, as the reference category against the HS and SB categories will be compared. It then defines the following two dummy variables for the non-reference categories:

groupHS = 1 if group = HS
        = 0 else (i.e., if group is either EN or SB)

groupSB = 1 if group = SB 
        = 0 else (i.e., if group is either EN or HS) 

You know R does all of this behind the scenes since the model summary shows the dummy variables in questions under the portion corresponding to the fixed effects. Note that these dummy variables are NOT added to your dataset.

For your other categorical predictor variable, with categories EN-GJT-R-GAP and EN-GJT-R-RES, the dummy variable is defined as:

conditionEN-GJT-R-RES = 1 if condition = EN-GJT-R-RES
                      = 0 else 

If you adopt a conditional interpretation for the intercept term in your model, then the intercept represents the expected value of the response variable when group = EN and condition = EN-GJT-R-GAP for the typical subject, typical token_set and typical list.

Isabella Ghement
  • 18,164
  • 2
  • 22
  • 46
  • 1
    Great post, @Isabella Ghement. One thing that jumped out to me about the list random factor is that it only has 2 levels! It's also strange that the residual variance is so small. I wonder if there isn't a data problem? – Erik Ruzek Dec 29 '20 at 21:27
  • 1
    Thanks, @ErikRuzek! Nice catch. A random factor with two levels looks suspect indeed! – Isabella Ghement Dec 30 '20 at 04:12