3

My colleagues and I are working on a suite of lmer post-estimation tools for a R package we are developing. One of the tools is an ICC function that would calculate the appropriate ICC for models with 1 or 2 specified random factors. Our challenge is to identify when the factors specified in the model are purely nested or crossed because the ICCs one would calculate based on a nested and crossed design are different.

The fundamental problem is that we want to be able to tell whether a lmer model fit with the random effects specification (1|factor1) + (1|factor2) is nested or crossed.

How would you suggest we tackle this problem?


Below is some context on lmer model specification in nested and crossed situations that may be useful if you are not familiar with the package.

lme4 is a very clever package that seems to infer from the data structure the appropriate calculation of the random effect variances. For example, and as documented in other threads, lmer treats the following nested random effect structures as equivalent if indeed the data structure is nested and the data is coded appropriately:

(1|School) + (1|Student) #Technically this is a crossed specification!
(1|School/Student)
(1|School) + (1|School:Student)  

As @BenBolker clearly states here:

Whether you explicitly specify a random effect as nested or not depends (in part) on the way the levels of the random effects are coded. If the ‘lower-level’ random effect is coded with unique levels, then the two syntaxes (1|a/b) (or (1|a)+(1|a:b)) and (1|a)+(1|b) are equivalent. If the lower-level random effect has the same labels within each larger group (e.g. blocks 1, 2, 3, 4 within sites A, B, and C) then the explicit nesting (1|a/b) is required. It seems to be considered best practice to code the nested level uniquely (e.g. A1, A2, …, B1, B2, …) so that confusion between nested and crossed effects is less likely.

The answer by @RobertLong in the CV thread linked above shows this problem and an alternative solution to it.

We want our ICC function to accurately report ICCs for truly nested vs. truly crossed (or cross-classified) models. And there seem to be a lot of moving pieces that we need to figure out. Any thoughts folks have on this are greatly appreciated.

Erik Ruzek
  • 3,297
  • 10
  • 18
  • 1
    Interesting post, Erik! The safest way to handle the case of two random grouping factors in my view is to ask the user to specify (via a "grouping = " option or something to this effect) whether they have (1) nested random grouping factors, (2) fully crossed random grouping factors or (3) partially crossed random grouping factors. This way, the onus will be on the user to clearly identify which of the 3 situations they are dealing with. – Isabella Ghement Mar 15 '20 at 20:04
  • 2
    The moving components that would make it hard to guess which situation the user finds themselves in include: (i) the actual study design, (ii) the coding choice made by the user for labelling the levels of the two grouping factors, (iii) the model formula itself. To look at the model formula alone misses the other two moving pieces and therefore offers an incomplete picture of the situation. Usually, (i) informs (ii) which in turns informs (iii). – Isabella Ghement Mar 15 '20 at 20:08
  • 2
    When you look just the third moving component, (iii), you are missing information on the first two moving components, (i) and (ii). If the user declares upfront the suggested "grouping = option", then you'll know the first missing component, (i). Once you know that, you can presumably proceed even if you don't know what specific coding the user relied on for the nested random grouping factors situation. – Isabella Ghement Mar 15 '20 at 20:24
  • 1
    This makes a lot of sense, @IsabellaGhement. We still would not know whether their data are coded such that `(1|factor1) + (1|factor2)` will produce what they want. If it doesn't then the ICC will be off. But that would be true for most functions people use. – Erik Ruzek Mar 16 '20 at 18:54
  • 1
    Yes, it seems to me that it is wiser to expect the user to supply the relevant information to your function rather than make a guess which could be wrong and lead to incorrect results. Then, based on the supplied information, your function would produce the expected results. In a package vignette, you can give examples illustrating all sorts of possibilities so that users are aware of their options. – Isabella Ghement Mar 16 '20 at 20:43

0 Answers0