I realize that a similar question is asked here also, but my concern is related to the last section of this article from Stanford It says the decision will depend on mutual exclusivity of classes and if they are mutually exclusive then prefer softmax else k binary classifiers.
Can anyone provide any rigorous explanation for that statement directly relating the criteria of mutual exclusivity of classes to the performance of the algorithm, because that article only has given just one line explanation : "This way, for each new musical piece(each class), your algorithm can separately decide whether it falls into each of the four categories."