0

Can someone explain me the problems associated with using more than 20 levels in a single dummy variables.

I am aware of the negative implications of using several dummies variables in a model. However recently I came across a reading that mentioned usage of several level with a single dummy category also has negative implications.i.e it can affect the estimates of other predictor coefficients.

When I experimentally tried it with a data set, I found that coefficients of other predictors are indeed distorted to a larger scale if I cut the number of levels from 20 to 15. ( Data set: I had 3 predictors and one dummy variable having 20 levels)

Can someone explain shortcomings of multilevel dummies.

kjetil b halvorsen
  • 63,378
  • 26
  • 142
  • 467
Sanju
  • 105
  • 6

1 Answers1

0

This question is really to broad, and you should have told us a reference for the claim that usage of several level with a single dummy category also has negative implications.i.e it can affect the estimates of other predictor coefficients. Assuming a regression model you want to use for prediction, a dummy with 20 levels eats up $20-1$ df (degrees of freedom), so this is just the problem of having to estimate many parameters. Without some more context, this is no different from having too many continuous predictors.

If this turns out to be a problem, you can look to Principled way of collapsing categorical variables with many levels? for ideas for solutions.

kjetil b halvorsen
  • 63,378
  • 26
  • 142
  • 467