3

Suppose you have a classification problem in which you want to classify inputs into two exclusive classes (y1 and y2) with an artificial neural network (which models P(y|x)).

Among the two following architectures for the output layer, which one is better to model P(y|x)?

  1. use two output neurons, one for each class, with a softmax activation function. If a1 and a2 are the outputs of the two output neurons, P(y=y1|x)=a1 and P(y=y2|x)=a2 with a1+a2=1.
  2. use a single output neuron with a sigmoid activation function. If a is the output of the neuron, we can set P(y=y1|x)=a and P(y=y2|x)=1-a.

I can see the two following advantages, which suggests that the choice depends on the specific problem:

  • In 1., the last layer has twice more parameters than in 2. and thus has more flexibility and can potentially model more complicated relationships.
  • In 2., the last layer has twice less parameters than 1. and thus is less prone to overfitting.
AdeB
  • 388
  • 2
  • 8
  • @MarcClaesen If you have two neurons (or more) with a softmax activation, the sum has to be one. – AdeB Nov 10 '14 at 19:22
  • oops, read over the softmax activation. – Marc Claesen Nov 10 '14 at 20:26
  • Check this out: http://stats.stackexchange.com/questions/207049/neural-network-for-binary-classification-use-1-or-2-output-neurons?noredirect=1&lq=1 is not the same? And so lesser the output number the better as it will update faster? – Peter Teoh Jan 30 '17 at 15:03
  • 1
    How ridiculous! The original question is blocked for the duplicated one. Actually this question was asked before than the another one. – hafiz031 Apr 23 '20 at 14:45

0 Answers0