I have data that can be clustered so that each cluster has its own set of both observations and variables. I want to fit a linear model on each cluster, but i want the clusters to share the same variance structure (and thus estimate parameters of the variance function on all data). Can it be done in standard R package like nlme/lme4?
1 Answers
As long as you have the same dependent and independent variables measured on all individuals, regardless of cluster membership, then you can use lmer
to model your data. However, you state that
each cluster has its own set of both observations and variables.
Does that mean you have a unique set of variables that you measured on cluster A and then another unique but different set of variables you measured on cluster B? If so, then those unique variables cannot be used in the lmer
model. For example, imagine that your clusters were schools. If you measured school type as a categorical variable (primary/secondary) for every school in your sample, then it could be included in the model. But if for some schools you measured school type and others you only got the age range of students, then neither of those predictors could be used in your model.
Any cluster-specific variables you wish to include in the model must be measured the same in all clusters.
Edit: The multilevel model allows you to model both the intercept (mean of the dependent variable in a group) and slope (association between a independent and dependent variable) uniquely for each group. Again, as long as you have the same variables measured on all groups (binary, categorical, nominal, etc.).
If using maximum likelihood (e.g., lmer
), then there is a limit to how many of the slopes you can model as varying across clusters, but with a Bayesian approach you could potentially model all slopes as varying.
The multilevel model equation, separated by level, makes this obvious. Here $X$ represents a level 1 variable (units within clusters) and $W$ represents a level 2 (cluster-specific) variable:
$y_{ij}$ = $\beta0_j$ + $\beta1_jX_{ij}$ + $\beta2X{ij}...$ + $e_{ij}$
$\beta0_j$ (cluster intercept) = $\gamma_{00}$ + $\gamma_{01}W_j$ + $u_{0j}$
$\beta1_j$ (cluster slope) = $\gamma_{10}$ + $\gamma_{11}W_j$ + $u_{1j}$
The intercept variance ($\sigma^2_{u0j}$) and slope variance ($\sigma^2_{u1j}$) are shared across clusters, and indicate how much clusters vary around the fixed effect parameter estimates of $\beta0_j$ and $\beta1_j$. These can be estimated using Empirical Bayes prediction if you are interested in working with them further.

- 3,297
- 10
- 18
-
The variables are binary, it's then natural to treat separately groups that have nothing in common (zeros). How to fit a model separately in each cluster in lmer with a common variance function? – Janusz Jan 12 '20 at 09:46
-
Thanks for the edit. So basically, it has to be done via random effects? – Janusz Jan 12 '20 at 16:53
-
Yes, in the random effects modeling framework, that is how you do it. If you want to switch to a "fixed effects" framework, also called "no pooling," then move to `lm` in R and enter dummy variables for groups and interact those dummies with each of the variables you want to vary by group. That will give you a ton of coefficients that will be difficult to interpret, but tends to be preferred by econometricians. – Erik Ruzek Jan 12 '20 at 21:06