Categorical features in a random forest

Question

I am currently training a random forest. After transforming a categorical feature into dichotomous columns, should I drop the first level?

For example, I have three unique values in a featured named sex:

m for male
f for female
na for not available

Thus, I encoded sex into three columns:

sex  sex_m  sex_f  sex_na
  m      1      0       0
  f      0      1       0
 na      0      0       1

I dropped sex (obviously), but should I also drop one of the three encoded columns?

Dropping the base level is necessary when running a regression to avoid multicollinearity, but this is not a problem when running a random forest. So what is the most common approach?

For reference, each tree is being trained with a randomly selected set of 8 out of 63 features.

Decision trees can usually cope directly with categorical variables — Firebug, May 29 '20 at 15:24
This answer is provided in the context of gradient boosting, but the logic also applies to random forest. https://stats.stackexchange.com/questions/438875/one-hot-encoding-of-a-binary-feature-when-using-xgboost/439191#439191 — Sycorax, May 29 '20 at 15:37
Related: https://stats.stackexchange.com/questions/410939/label-encoding-vs-dummy-variable-one-hot-encoding-correctness/414729#414729, https://stats.stackexchange.com/questions/231285/dropping-one-of-the-columns-when-using-one-hot-encoding/329281#329281 — kjetil b halvorsen, May 30 '20 at 18:22

score 1 · Accepted Answer · answered May 29 '20 at 15:28

Technically, both will work.

However, creating dummies is rarely a good idea in random forests as it weights down the chance of other variables to be picked for splitting.

Integer coding often does the job pretty well. The more levels, the more it helps to use a meaningful order. Some implementations - e.g. ranger in R - do smart ordering, internally. Avoiding dummy coding also greatly helps to interprete the models by the usual suspects (variable importance, partial dependence plots).

Categorical features in a random forest

1 Answers1