I looked for this before but I couldn't find it exactly, so let me know if it's a duplicate.
My question is, should categorical variables be one hot encoded to run Random forests? Or just transforming them to nominal is fine? Take into account that there are some continuous variables in my model as well, if that makes any difference. Also, I'm using Python's sklearn
.
I did search this but anywhere I looked there was a different answer. I read this article as well which is quite informative, but I wanted to know if there's any consensus of some kind, or if any trustworthy reference can be found?
Edit: Decided to quickly try the one hot encoding implementation and the first thing I notice is that it is taking considerably longer to run.