What can a Multilevel Model do that Linear Regression can't?

Question

In short: I wonder when I would ever want to use a multilevel model as opposed to a linear regression with appropriate structure.

In detail:

When I look at Wikipedia, I understand that multilevel models describe the following situation:

$$ y_{ij} = \beta_{0j} + \beta_{1j} X_{ij} + \epsilon_{ij} $$

with the usual meanings. In the following, I take it that $i$ denotes an individual (or the smallest unit), and $j$ is some kind of level. For simplicity, say it is a state of the country, or a school.

To me, the meaning of $\beta_{0j}$ is that the intercept may vary across levels. Similarly, to me $\beta_{1j}$ indicates that the effect of $X_{ij}$ varies over the levels (state, school, ...).

If you told me to regress some $X$ on some $y$ in a way that takes into account that the overall means may differ between groups $j$, and that the effect of $X$ on $y$ may differ between groups $j$, here is what I would naturally do: I would run the following regression:

$$ y_{ij} = \beta_1 + \sum_{k=2}^K 1(j=k) + \beta_2 X_{ij} + \sum_{k=2}^K 1(j=k)\beta_2 X_{ij} + \epsilon_{ij} $$

where $1(j=k)$ is a dummy for membership in group $k$, and it is understood that there are $K$ groups.

What have I done? I have included dummies or, as we call them in econometrics, fixed effects for each group. These take mean differences between different groups into account. Similarly, I have interacted the coefficient of interest $\beta$ with said dummies to see if there are differences by group.

Of course, there is the question of inference. By using all these dummies and interactions, one may hope that all structural dependencies have been absorbed and the remaining error term is white noise. However, as econometricians, we worry intensely about incorrect inferential statements due to some form of heteroskedasticity or unobserved heterogeneity (not to mention endogeneity!). In particular, group-specific heteroskedasticity is the econometrician's incarnation of Freddy Krueger, and in this example it seems fair to say that individual members of group $j$ may have some elements of the variance-covariance matrix in common. I would thus compute so-called cluster-robust variance covariance matrices, which provide me with a variance-covariance matrix that I can use to get correct or at least conservative inferential statements.

Now let me compare both models to make sure you understand what I believe to be true (also to make it easier for you to point out flaws in my understanding):

Similarities: Both account for differences in group means. Both can account for differences in the coefficient of interest by level. And the multilevel model, if distributional assumptions hold, as well as the simple linear regression with fixed effects and interactions, computed with a cluster-robust variance-covariance matrix, provide me with correct inferential statements.

Differences: In the linear regression case, I don't need any distributional assumptions, and as far as I know even in the absense of clustering, the procedure provides valid inferential statements, though possibly conservative ones. In the multilevel model, if distributional assumptions do not hold, I am not sure what happens, but I would guess nothing good.

My question: In what kind of situation would I ever prefer to fit a multilevel model? Is there something the multilevel model can do that the linear regression with "level dummies" and group-interactions cannot do?

The smaller the schools and the more interested you are in the school individual levels and effects, the more helpful will the mixed-effects model be because OLS with then provide very unstable estimates for those effects. — Michael M, May 26 '16 at 09:10
Funny. I came to this site today to ask the exact same question. In my view, the difference is not in what is being modeled but in how it is estimated. MLM partially pools its estimates so that those that are more variable or from smaller samples are pulled towards the average effect. However, the cost to this is the assumption that the random parameters (slope, intercepts) are both normally distributed (multivariate normal if together). These are the differences I am aware of, but am curious to hear those more knowledgeable in MLM. — ATJ, May 26 '16 at 18:30
See also responses #5-6 to this post here: http://stats.stackexchange.com/questions/1995/under-what-conditions-should-one-use-multilevel-hierarchical-analysis/1997#1997 — ATJ, May 26 '16 at 18:35
That is also my understanding so far @ATJ. I wonder if we should then use multilevel models at all. OLS in itself uncovers differences in means, i.e. where the weight of the data is. If MLM comes to a different conclusion, this difference must be driven by distributional assumptions. Which answer to prefer then? I can't think of a situation in which I would prefer MLM to OLS, but so many people are using MLM, hence I believe I have overlooked something... — coffeinjunky, May 26 '16 at 18:36
Ha. I was actually thinking the exact opposite. A great book is the recent "Statistical Rethinking" by McElreath in which he writes, "multilevel regression deserves to be the default form of regression...Perhaps the most important reason is that even well-controlled treatments interact with unmeasured aspects of the individuals, groups, or populations studied. This leads to variation in treatment effects, in which individuals or groups vary in how they respond to the same circumstance." Differences in results can also be attributed to unstable OLS estimates, which MLM handles much better. — ATJ, May 26 '16 at 18:43
The way I reason is this: For me, a stable OLS estimate is data waving a flag saying `look over here, an elephant! an elephant!` whereas an unstable estimate is more akin to me spotting what might be a mosquito, but I am not sure, as it is so small it could have been white noise. If I need more structure and distributional assumptions to turn this mosquito into an elephant, would I bet my own money on this effect being real? I am not sure I would. For that reason I would prefer OLS to MLM. I do understand your perspective though, and I definitely think heterogeneity is under-emphasized! — coffeinjunky, May 26 '16 at 18:57
In any case, it is an interesting discussion, and I hope someone who is more grounded in MLM will weigh in at some point. — coffeinjunky, May 26 '16 at 18:59

What can a Multilevel Model do that Linear Regression can't?

0 Answers0