5

We have trained two different neural networks for MNIST dataset. Here are the losses and accuracies obtained by these networks for the training data:

net0: loss: 20780.8291187
net1: loss: 209.928699374
net0: TRAIN ACCURACY     0.985890040888
net1: TRAIN ACCURACY     0.835298627336

The used loss function is the cross-entropy. We expect higher accuracies for lower losses, but here, the loss for net1 is about 100 times lower than net0 but its accuracy is lower. What is the reason?

Hossein
  • 3,170
  • 1
  • 16
  • 32
  • 2
    https://stats.stackexchange.com/questions/256551/why-does-the-accuracy-not-change-when-applying-different-alpha-values-in-l2-reg/256554#256554 – Sycorax Jun 19 '17 at 21:56
  • @Sycorax Thanks. So if cross-entropy is so fragile that it can be completely different from accuracy, why we rely on it so much? – Hossein Jun 21 '17 at 07:49
  • "Fragile" is the wrong word, to my mind. Cross entropy measures how well-calibrated a model is. Accuracy is scarcely informative because many poorly or well calibrated models can have the same accuracy. – Sycorax Jun 21 '17 at 14:14
  • @Sycorax But here I have a model that has lower cross-entropy loss, but with **lower** accuracy. They don't have the same accuracy. – Hossein Jun 21 '17 at 16:32
  • 1
    It's not clear to me why that's a problem. Q: "Which model has better accuracy?" A: "net0" Q: "Which model has better cross-entropy?" A: "net1" Q: "Why?" A: "Accuracy and cross-entropy measure different things." – Sycorax Jun 21 '17 at 17:11
  • @Sycorax Here is what I think: Our ultimate goal, at least in a problem like digit recognition, is to obtain better accuracies. So, if a loss function disagrees with the accuracy, it is not so reliable. In other words, if a model with a lower accuracy can get the lower cross-entropy loss, this means that the cross-entropy can mislead us since we are optimizing this loss function to obtain better accuracies. – Hossein Jun 21 '17 at 21:17
  • 1
    If all you care about is accuracy, a better accuracy is better. – Sycorax Jun 21 '17 at 21:27

1 Answers1

3

Lower cost function error not means better accuracy. The error of the cost function represents how well your model is learning/able to learn with respect to your training examples.

Now the question is , Is the model learning something that I expect it to learn?

It can show very low learning curves error but when you actually testing the results (Accuracy) you are getting wrong detection's , this is called high variance.

The best is to monitor the both learning error and accuracy for each epoch/iteration, While cost function error goes down and accuracy goes up keep training, otherwise stop (:

The accuracy is not good enough? check : 1. do I have high variance problem ? add more training examples to generalize the learning (better find more examples which reminds the problems where your model fails on).

  1. Do I have high bias problem ? my model is pure or to complex , need to fix it / try something else.

http://www.holehouse.org/mlclass/10_Advice_for_applying_machine_learning.html

Good luck (:

Stav Bodik
  • 139
  • 3
  • you are missing the point here. Bias is measured on Train Set and Variance is measures on Validation set. Instead, the question is: why in my training set I have a small cross-entropy loss but bad accuracy. – Seymour Mar 29 '20 at 10:08