2

I noticed that in some (rare) situations, training subject/group/subpopulation specific models is preferred to one general predictive model for all data (probably due to accuracy?).

For example, in the case of a medicine data, I saw that a predictive model is trained for each patient separately (instead of one model for all patients). In other field, I have seen training separate predictive models for each geographic region.

  • Under what circumstances are the subject/group specific models often preferred to one general model?
  • Why is one general model with a group variable as a feature not enough?
  • What are the advantages of subject/group specific models over one single model?
sitems
  • 3,649
  • 1
  • 25
  • 52

1 Answers1

1

My guess is that researchers who are unfamiliar or uncomfortable with hierarchical/mixed-effects models might break their data up and create separate models. I don't see any advantage to broken-up models unless your separate groups have very different covariances and your modeling technique adjusts to this for you.

Hierarchical/mixed-effects models allow for sharing of strength among groups, which means that smaller groups will be pulled more towards the overall mean, while larger groups -- those with more information -- are more independent. The amount of pooling is determined by the data.

You do need a fair number of groups to get reasonably sharp distributions of group coefficient: one rule of thumb is at least 5 groups, though I've also seen at least 30 groups. But I believe I've read Andrew Gelman saying that a hierarchical model won't do worse than separate models.

Wayne
  • 19,981
  • 4
  • 50
  • 99