I'm trying to figure out to "best" specify a model in lmer
. Any insight is appreciated!
For background information, my data are repeated measures of some outcome y
across a 10 year period, with each year being indexed by the variable year
. The units of analysis are census tracts (tract
). Each census tract is situated within several higher-order geographic factors [i.e., tracts are located in metropolitan areas (cbsa
); states (state
); and Census regions (region
)]. Overall, I'm interested in estimating how y
changes across year
.
Most of the geographic factors are clearly delineated. E.g, over the course of the study, tract == A123
only belongs to cbsa == 1
, state == 2
and region == 3
. Given that structure, my first stab at a model was:
lmer(y ~ year + (year | tract) + (1 | cbsa) + (1 | state) + (1 | region), ...)
After taking a closer look at the data, I found that a handful of metropolitan areas were split across states. E.g., cbsa == 1
crosses state lines and can be found in state == 2
for some tracts and state == 3
for other tracts.
My question is: is the model presented above still appropriate for how these data are structured?
An alternative fit that I can imagine is something like:
y ~ year + (year| region:state:cbsa:tract) + (1 | region:state:cbsa) + (1 | region:state) + (1 | region), ...
which, to the best of my understanding, estimates separate intercepts for each state:cbsa pair (such that metropolitan areas split across states will have different random effect estimates). This and the prior model give different results.