Consider a simple multiclass problem in which there is a categorical variable with many levels (>1000). The nature of the problem is such that we can not reduce the dimensions of this variable.
The classic way to solve this problem, would be to one-hot encode the categorical variable and train a single multiclass model using gradient boosting or random forrest.
My question is whether one should expect an improvement in accuracy if the categorical variable is removed, by clustering data into groups of data sets with similar categorical variables. Then one could train multiple multiclass classifiers. Predictions would be made by looking at the value of the categorical variable, then applying the multiclass classifier trained on that data set.
What is the difference between having a single multiclass classifier trained on categorical data compared to using the approach described above?