Using probabilities for multi label classification

Question

After working for a while on this text classification problem, I realize that some documents belong to more than one class. I am using multinomial logistic regression which also provides a probability distribution over the classes (labels). I wonder if it is a good idea to use this distribution for multi labeling. For example, when the probabilities are [0.3, 0.6, 0.1] for the classes A, B, C respectively, I can label the document with the classes that have a probability for that document higher than a predefined threshold (say 0.25) .

Is this a good idea? I've made a Google search but couldn't found any document mentioning a method similar to this. How reliable is this method? What do you think?

To be more clear about my problem space, there are like 20 classes and mostly a document belongs to either one or two of these classes.

score 1 · Answer 1 · answered Nov 10 '16 at 19:02

1

Multinomial assumes that an outcome belongs to only one class, but you can redefine classes. E.g. if there are two original classes A and B, then you can label the documents as belonging to three mutually exclusive classes:

I - document is A only

II - document is B only

III - document is both A and B.

answered Nov 10 '16 at 19:02

Nik Tuzov

511
2
10

Yes, that is one of the approaches. But I have nearly 20 classes, which could lead to many combinations. – hrzafer Nov 10 '16 at 19:22
If you have no use for that many classes, then collapse a few classes into one. E.g. I, II and III can be collapsed into a single class, "A or B". – Nik Tuzov Nov 14 '16 at 20:31

Using probabilities for multi label classification

1 Answers1

Linked