I have read at many places that Decision trees and Random forests, if deep enough, can handle categorical variables without one-hot encoding.
1) What is special about these algorithms that they can handle categorical variables without one-hot encoding?
2) Are there any specific algorithms in which we do Dummy encoding (n-1 columns created for a categorical variable) vs One Hot encoding (n columns created)? I fail to understand that when one column in One hot encoding is having information that can be gathered from other columns, why do we ever prefer One hot encoding and why does the concept even exist?
3) Why does it say "if deep enough"?
Any helpful resources, links, videos or your own explanations are welcome. I want to clear this doubt, once and for all.
Found a similar question but doesnt answer everything I want to know. https://datascience.stackexchange.com/questions/5226/strings-as-features-in-decision-tree-random-forest/19829#19829
from AN6U5's answer - says Random Forest does not require One-hot Encoding and I have read many more similar answers saying this.