I have a data set with a small number of textual features (less than 10) and a response variable with a lot of classes (~100).
Is there any recommendation for models that are powerful for this kind of problem ?
I have a data set with a small number of textual features (less than 10) and a response variable with a lot of classes (~100).
Is there any recommendation for models that are powerful for this kind of problem ?
The number of features does not matter as long as the predictive power of the features is high enough. For example, if the data set is a collection of news articles, one feature contains tags and the response variable is the category, one can expect a reasonable model quality (assuming descriptive tags, of course).
On the other hand, if the predictive power is poor, no model will help (or the difference between models is negligible).
I had good results with Stochastic Gradient Boosted Tree in the past (although with a lot of samples). Some more ideas can be found here: Off the shelf tool for multi-label classification