1

I'm trying to figure out to "best" specify a model in lmer. Any insight is appreciated!

For background information, my data are repeated measures of some outcome y across a 10 year period, with each year being indexed by the variable year. The units of analysis are census tracts (tract). Each census tract is situated within several higher-order geographic factors [i.e., tracts are located in metropolitan areas (cbsa); states (state); and Census regions (region)]. Overall, I'm interested in estimating how y changes across year.

Most of the geographic factors are clearly delineated. E.g, over the course of the study, tract == A123 only belongs to cbsa == 1, state == 2 and region == 3. Given that structure, my first stab at a model was:

lmer(y ~ year + (year | tract) + (1 | cbsa) + (1 | state) + (1 | region), ...)

After taking a closer look at the data, I found that a handful of metropolitan areas were split across states. E.g., cbsa == 1 crosses state lines and can be found in state == 2 for some tracts and state == 3 for other tracts.

My question is: is the model presented above still appropriate for how these data are structured?

An alternative fit that I can imagine is something like:

y ~ year + (year| region:state:cbsa:tract) + (1 | region:state:cbsa) + (1 | region:state) + (1 | region), ...

which, to the best of my understanding, estimates separate intercepts for each state:cbsa pair (such that metropolitan areas split across states will have different random effect estimates). This and the prior model give different results.

mkq
  • 13
  • 2

1 Answers1

0

From the description, this is a partially crossed design.

The 2nd model is appropriate for a design that is fully nested. The first model is appropriate for a partially nested design, provided that the factors are coded uniquely. For example, if you have tract1 in cbsa1 and you also have a tract1 in cbsa2, but these are actually different tracts, then you will need to recode the tracts to be unique. More information on this can be found here:

Crossed vs nested random effects: how do they differ and how are they specified correctly in lme4?

Robert Long
  • 53,316
  • 10
  • 84
  • 148