What is a good model for heavy multiclass problem and small number of (textual) features?

Question

I have a data set with a small number of textual features (less than 10) and a response variable with a lot of classes (~100).

Is there any recommendation for models that are powerful for this kind of problem ?

score 1 · Answer 1 · answered Oct 15 '20 at 09:13

1

The number of features does not matter as long as the predictive power of the features is high enough. For example, if the data set is a collection of news articles, one feature contains tags and the response variable is the category, one can expect a reasonable model quality (assuming descriptive tags, of course).

On the other hand, if the predictive power is poor, no model will help (or the difference between models is negligible).

I had good results with Stochastic Gradient Boosted Tree in the past (although with a lot of samples). Some more ideas can be found here: Off the shelf tool for multi-label classification

answered Oct 15 '20 at 09:13

mlwida

9,922
2
45
74

Thanks for the reply. Also I have a question that some of the classes labels might not even occur in the dataset however they might occur in new dataset. How could model include this kind of information? – Beherit Oct 15 '20 at 16:55
A model cannot predict classes it has not seen before. One can work around the problem by combining classes on a higher level (hierarchical classification) or treat model prediction as "I don't know" if the confidence in the predicted class is to small, so that a human can deal with it approriately. – mlwida Oct 15 '20 at 17:55

What is a good model for heavy multiclass problem and small number of (textual) features?

1 Answers1