I think it may be a problem if we directly use dummy variable for a categorical predictor having hundreds of levels.
I have found one solution from the book 'Elements of Statistical Learning' (p.329). The solution is mentioned in classification tree sections. Specifically, the solution orders the levels of the categorical predictor by the number of occurrence of each level in one class, and then treats the predictor as an ordered predictors.
I wonder for models other than classification tree, such as linear regression, what would be proper ways of handling categorical predictors with too many levels.
I found an old post asking similar questions, but no answers have been posted: