1

I would like to know more about the theoretical implications of such a statement.

Karolis Koncevičius
  • 4,282
  • 7
  • 30
  • 47
  • 2
    Please edit the question to limit it to a specific problem with enough detail to identify an adequate answer. – Community Oct 11 '21 at 04:56

1 Answers1

1

Multiclass neural network would differ from multiple binary classifiers by using a different activation functions. You may approach both problems differently, but technically, that's the only change that is needed. For multiclass classification you would use softmax as an activation function, while for multiple binary classifications you would use sigmoid activation. Softmax is an extension of sigmoid for more that two classes. They are the same if you have only two classes, but not otherwise.

When you use softmax, the probabilities returned by it would sum to one, hence you are assuming that the classes are mutually exclusive. In case of multiple sigmoid activations, there is nothing that prohibits them from returning probabilities that do not sum to one (across the classes), so you are not assuming that the classes are mutually exclusive. That's a big difference.

That said, people sometimes use the activations exchangeably, for example use sigmoid for multiclass data and pick more than one class with highest probabilities as prediction or use multiple sigmoid activations and for the final classification pick the class with the highest probability. The reason for doing so is not that this is theoretically justified, but that all models are wrong and sometimes you can have a wrong model that works just fine.

Tim
  • 108,699
  • 20
  • 212
  • 390