1

I have seen a suggestion that if there are a large number of levels of a factor, one ought to treat them as random effects. I think it has come up in several places, but most recently I read it in The R Book (2013). Crawley writes on p. 531: "... if you have factors with large numbers of levels you might consider using mixed-effects models rather than ANOVA (i.e. treating the factors as random effects rather than fixed effects; ... ".

  1. What justifies the use of random effects in this particular case?
  2. I'd be thankful for any good references for the approach, both where it is explained further or applied. References contrasting a random effect approach with that of a fixed effect are especially welcome.
kjetil b halvorsen
  • 63,378
  • 26
  • 142
  • 467
  • You may find this interesting: https://stats.stackexchange.com/questions/120964/fixed-effect-vs-random-effect-when-all-possibilities-are-included-in-a-mixed-eff/137837#137837 – Tim Oct 20 '21 at 13:24
  • Related (but not a duplicate) [Principled way of collapsing categorical variables with many levels?](https://stats.stackexchange.com/questions/146907/principled-way-of-collapsing-categorical-variables-with-many-levels) – kjetil b halvorsen Oct 20 '21 at 13:32
  • 1
    More or less the same idea for why you want to use neural networks (with embedding layers) when you have high-dimensional categorical inputs in prediction problems. In some sense, random effects are even better, because - if used correctly - you can reflect how sure or unsure your are about the value of a latent random effect. In some sense, they are worse (often only 1-dimensional, often we don't really try to have interactions between high-dimensional random effects and other model inputs and certainly rarely with any non-linear splines, as we can do with neural networks) – Björn Oct 20 '21 at 13:34

0 Answers0