6

This is pretty general, but what are the pros and cons of including additional levels in multilevel model (linear mixed model)?

I have a data containing information on multilevel administrative division of the country and most of the levels are more or less of interest for me. Sample size is not a problem in here. On one hand, simpler models are in most cases better, on another, including additional levels would enable me to compare the variances on different levels. I found examples of 4-level models in the literature, but I haven't seen any practical advise on that. Could you provide any arguments and/or literature on that?

Tim
  • 108,699
  • 20
  • 212
  • 390
  • Some general advice: http://andrewgelman.com/2007/08/16/no_you_dont_nee/ http://andrewgelman.com/2012/04/17/hierarchicalmultilevel-modeling-with-big-data/ Try to use all the levels that you have ? Do it work? If not, why? – kjetil b halvorsen Dec 04 '14 at 12:55
  • Temporary I killed the computation after going for over an hour and consuming >40GM RAM and counting so I don't know. – Tim Dec 04 '14 at 12:57
  • 1
    Then, start with some simpler models, see what you can get from them, and build up gradually, maybe one level at a time. And leave the computer overnight! – kjetil b halvorsen Dec 04 '14 at 13:01
  • @Tim, Have you considered "compressed sensing"? If you have bottomless samples, why not uniformly random pull a sample of them and run your model fitting against the sample? If you did it a few times then you could get an idea of stability of various complexity of model vs. sample size and likely get a decent estimator for your overall system. Once you know the form and have a good start point it is much easier to adjust the fit. – EngrStudent Jan 12 '15 at 17:47
  • @EngrStudent thanks but that is not the point. With a certain sample size it is possible to estimate a model with some number of levels - it is pretty clear because after reaching some point of complexity the model is just unestimable. The question is rather on if there is a point where the estimates are untrustworthy even if "something" got estimated. In most cases you use this kind of models with a finite number of cases and a finite number of possible levels. – Tim Jan 12 '15 at 17:57
  • How do you feel about using "glmulti" to numerically explore the question? – EngrStudent Jan 13 '15 at 13:49
  • @EngrStudent thanks again and thanks for learning me about this package - I didn't know it yet. However, it is still not the case. I have a large multi-level sample, so estimating models I use takes a *very* long time. This is also *not* a model selection problem for me: if I could I would include *all* the possible levels in the model - the question is: does including all of them (e.g. seven levels of nesting) make sens (i.e. would the estimates be stable and trustworthy)? – Tim Jan 13 '15 at 14:13
  • Can you clarify what you mean by levels? I was thinking you meant in DOE terms so a 7 level model would be 7th-order model interactions. Is that right? Can you give more detail about your data? – EngrStudent Jan 13 '15 at 14:40
  • @EngrStudent I mean levels in multilevel modeling. Example: students nested in classes nested in schools nested in cities nested in regions, etc. – Tim Jan 13 '15 at 14:47
  • @Tim, From what wikipedia says here (http://en.wikipedia.org/wiki/Multilevel_model), this looks like a nth order model can be described exactly by a n+1 term multivariate Taylor series. Is that correct? – EngrStudent Jan 14 '15 at 12:57
  • @EngrStudent I would say it is a little bit more complicated since you have both fixed and random effects. On another hand some minimal number of observations and groups for random effects is needed so the number of random effects is for sure << n. The fact that fixed and random effects are different entities also complicates it. – Tim Jan 14 '15 at 13:10
  • @Tim, so an nth order model is on both mean and variance plus an \eta term for the error? – EngrStudent Jan 14 '15 at 15:57
  • @EngrStudent sorry but I don't understand... You still ask about levels? If yes, levels are defined as in multilevel models. – Tim Jan 15 '15 at 10:25

1 Answers1

2

This is hard to answer without much context. But in general, parameters of additional levels will be harder to estimate. For each additional level you will need much more data, specially for the variance-covariance parameters of the higher levels. See here for a related discussion.

Manoel Galdino
  • 1,750
  • 1
  • 11
  • 18
  • But if number of observations is not a problem, then every estimable model is fine..? – Tim Dec 04 '14 at 21:09
  • @Tim, As far as I understand, the answer is yes. However, there will be computational costs that may be prohibitive for practical purposes. And it will be harder to interpret and check the fit of more complex models as well. – Manoel Galdino Jan 13 '15 at 14:13
  • What would you have to look for when checking the fit in this case? Do you know any "warning sighs" that could be helpful in the case of multiple-level models? Thanks! – Tim Jan 13 '15 at 14:16
  • 1
    I've never seen more than 3 levels. I've only worked with 2 levels and I'm extrapolating from what I know with two levels or three levels. I'd be concerned with convergence diagnostics and also if the variance parameters are too big. It may be hard to put a good prior on these high levels variance parameters. Run some prior sensitivity checks. – Manoel Galdino Jan 13 '15 at 14:25
  • 1
    Thanks again! This topic certainly deserves a Monte Carlo study. – Tim Jan 13 '15 at 14:27