Sigmoid vs Softmax Accuracy Difference

Question

I have trained a neural network on DNA sequences data and my training set has exactly the same number of data in both classes. When I select a softmax function at the end, my accuracy remains at 47% and loss for both validation and training stays the same at around 7.6 regardless of how many batches and epochs I choose. But once I change the softmax function to sigmoid, the validation accuracy starts at 50% for the first epoch and reaches above 98% at the end which is odd cause I think at best my network should achieve an accuracy of around 80% since I know some of my data is misclassified. Why is this happening?

What exactly are you trying to predict? Softmax vs sigmoid are completely different models. — Tim, Apr 19 '21 at 19:55
Do you have 2 output neurons in both cases? Are your class labels mutually exclusive? — Sycorax, Apr 19 '21 at 20:15
@Tim I am trying to predict whether they belong to one class or the other. — A4747, Apr 21 '21 at 20:04
@Sycorax I have two output neurons for softmax but one for sigmoid. — A4747, Apr 21 '21 at 20:04

Tim · Answer 1 · 2021-04-22T04:53:35.837

2

Using sigmoid with dummy encoded output (one binary column) vs using softmax with two one-hot encoded columns (one of the columns is equal to one, the other is zero) is mathematically equivalent and should give same results. Your likely have a bug in the code.

edited Apr 22 '21 at 04:53

answered Apr 21 '21 at 20:45

Tim

108,699
20
212
390

Do the probabilities in your two softmax nodes not have to add to $1$? – Dave Apr 21 '21 at 20:50
1

Are you sure about this? I thought it was the opposite way. Sigmoid was for when the outcomes are not mutually exclusive and softmax was for when the outcomes are mutually exclusive but both of them can be used for binary classification. – A4747 Apr 21 '21 at 21:04
@A4747 no, sorry what I wrote was incorrect. I misunderstood was you were saying and the answer itself was misworded. Another example why you should not trust what you read over the Internet. You are correct, those two implementations should be the same, something is wrong. – Tim Apr 22 '21 at 04:51

JacKeown · Answer 2 · 2021-04-20T21:44:47.073

-1

Softmax should work better for classification than sigmoid (with 2 output features in both cases). I'm guessing you have a bug in your code somewhere.

If you're getting better accuracy than you believe is reasonably possible, then you are probably overfitting and either have a model which is too complicated or (perhaps more likely) have too little data.

edited Apr 20 '21 at 21:44

answered Apr 19 '21 at 19:53

JacKeown

628
1
6
17

2

Hi, your conclusion (bug in code) is reasonable but your premise (softmax should work better) is incorrect. For binary classification, 2-class softmax is equivalent to sigmoid (because the softmax is constrained to a simplex). If the sigmoid output is $p$, then the probability for the other class is necessarily $1-p$, which you'd also get out of softmax. I'm not the downvote, but I suspect this is why. – Arya McCarthy Apr 19 '21 at 20:23
I know the model is not overfitting since the loss is decreasing as the epochs increase. – A4747 Apr 19 '21 at 20:36
@AryaMcCarthy Yes, I see your point. I was thinking that they were using 2 outputs and not one. In that case, softmax would add the constraint that they need to add to one as opposed to the more relaxed constraint that they both need to be between 0 and 1 imposed by sigmoid. Softmax with 2 outputs should be equivalent to sigmoid with 1 output. Softmax with 1 output would always output 1 which could lead to a 50% accuracy bug. (not 47 if the classes are exactly balanced though...) – JacKeown Apr 20 '21 at 21:42
@A4747 Do you mean that your testing loss is decreasing or simply your training loss? – JacKeown Apr 20 '21 at 21:44
@JacKeown both were decreasing with sigmoid. I have noticed that when I increase the number of batches after a certain point the accuracy and loss remain stable and do not change. What is really strange for me is when I use "validation_split=0.3" for training with sigmoid I get above 98% accuracy but if I pass in "validation_data=(x_val, y_val)" to model.fit the accuracy will get as high as 65%. I have checked my code and I honestly don't think there are any problem with labeling the validation set. I have also shuffled both my training and validation dataset. – A4747 Apr 21 '21 at 15:18
An update to what I just said. I changed the test set again and this time when I test the model on this new test set the accuracy is quite high. I get 99% accuracy for the training set and 97% for the test set. – A4747 Apr 21 '21 at 21:35

Sigmoid vs Softmax Accuracy Difference

2 Answers2