Let's say that you have a classification problem where the dependent variable has MANY levels (say 20) and you CANNOT transform the target (i.e. no clustering, combining of levels, etc.). Is it a good idea to fit a binary model for each level (i.e. Y = 1 if specific level versus Y= 0 otherwise) as opposed to a model where you are trying to predict all of the level of Ys simultaneously?
I know that when you are using logistic regression, the multinomial approach is more efficient, but will this hold when employing machine learning algorithms? So far the best model was a boosted tree. Any thoughts on this?
For the choice of the predicted value, I was thinking that I would just predict whichever category corresponded to the highest probability.