0

I can't seem to understand why when approaching a K-class classification problem using a Neural Network we take a different approach than any other classification model.

I understand that in the logistic regression case, we split the problem into K separate binary classification problems, using K different logistic regressions, one for each class. Then the class corresponding to the logistic regression that outputs the highest probability, is the output of the classification.

It seems to me obvious that for a Neural Network we would take the same approach, however it looks that the established approach is to instead increase the number of output neurons from 1 (in the binary classification case) to K, where each neuron is doing a binary classification for each class.

It seems to me that using K separate Neural Networks with a single output neuron, could only increase the predictive power, as they would not have to share any more the same parameters for all the layers before the output layers (as in the single Neural Network with K neurons in ouput layer case) The only downside I can see in this approach is being more computationally expensive. Any other ideas as to why this might not be a good idea?

Extra question: Would you think that in case it is much more computationally expensive, it would be worth consider K separate simpler neural networks vs a single neural network with K output neurons?

  • I think this is a case of reasoning from a faulty premise. If you have a $K$-class classification problem (that is, each observation belongs to exactly 1 of $K$ possible classes), a standard approach is multinomial logistic regression. Multinomial logistic regression is a single model that has $K$ (or $K-1$ because of the non-negative and sum-to-unity requirements for probabilities) outputs, and is a special case of a multinomial classification network. – Sycorax Jan 09 '20 at 16:54
  • For a comparison, see https://stats.stackexchange.com/questions/52104/multinomial-logistic-regression-vs-one-vs-rest-binary-logistic-regression – Sycorax Jan 09 '20 at 17:12
  • @SycoraxsaysReinstateMonica, the question perhaps should be changed to reference generic one-vs-rest approach then? (Although in that case your linked question gives a partial answer.) – Ben Reiniger Jan 09 '20 at 17:22

0 Answers0