2

I realize that a similar question is asked here also, but my concern is related to the last section of this article from Stanford It says the decision will depend on mutual exclusivity of classes and if they are mutually exclusive then prefer softmax else k binary classifiers.

Can anyone provide any rigorous explanation for that statement directly relating the criteria of mutual exclusivity of classes to the performance of the algorithm, because that article only has given just one line explanation : "This way, for each new musical piece(each class), your algorithm can separately decide whether it falls into each of the four categories."

Lerner Zhang
  • 5,017
  • 1
  • 31
  • 52
Siddharth Shakya
  • 638
  • 6
  • 19
  • softmax produces a probability distribution vector: sum of its elements is 1 (i.e., it may confidently predict only a single category). This is a poor choice when you want a single example assigned multiple labels. – Alex Kreimer Nov 11 '19 at 15:05

1 Answers1

0

I thought what you are asking is the difference between multi-class classification and multi-label classification. When a case has multiple labels, the labels are not mutually exclusive and hence should not be modeled using a Softmax classifier. A Softmax is an extension of just one binary classifier[see here].

Lerner Zhang
  • 5,017
  • 1
  • 31
  • 52