0

I am training a 6 layer Deep Neural Network with:

<bound method Module.parameters of Model2(
  (layer1): Linear(in_features=4800, out_features=8000, bias=True)
  (layer2): Linear(in_features=8000, out_features=5000, bias=True)
  (layer3): Linear(in_features=5000, out_features=2000, bias=True)
  (layer4): Linear(in_features=2000, out_features=200, bias=True)
  (layer5): Linear(in_features=200, out_features=20, bias=True)
  (layer6): Linear(in_features=20, out_features=52, bias=True)

Inputs are images in size 60 * 80. I am using relu activation function, Cross-Entropy for loss function, and Stochastic Gradient Descent. I used the weight-decay parameter(i.e. L2 regularization method). I don't face overfitting but the accuracy was 61 % and now is 26%!

Can anyone explain the reason?

(I assigned hyperparameter of regularization to 0.01. I changed it but no improvement in the accuracy!)

Maryam
  • 1
  • 1) How does your loss function perform with and without regularization? Accuracy has some issues as a performance metric. 2) In-sample or out-of-sample accuracy? – Dave Jul 06 '20 at 11:29
  • @Dave I didn't get it. What do you mean by loss function performance? I am just comparing the accuracy of model on the train and test data(to check overfitting) and the accuracy of the model. Also, I checked the parameters of my network, weights decreased but they were not near zero. – Maryam Jul 06 '20 at 11:33
  • @Dave It is out-sample accuracy that I mentioned in the post. – Maryam Jul 06 '20 at 11:35
  • Evaluate the cross-entropy loss on your out-of-sample data. It is possible for loss to decrease while accuracy also decreases. Accuracy turns out to be a surprisingly bad performance metric, despite how common it is. – Dave Jul 06 '20 at 11:37
  • @Dave For 10 epochs my losses are:epoch 1 3.109, epoch 2, 2.585, epoch 3, 2.469 , epoch 4, 2.306, epoch 5, 2.267, epoch 6, 2.161, epoch 7, 2.103, epoch 8, 2.043, epoch 9, 2.047, epoch 10, 1.996, – Maryam Jul 06 '20 at 11:49
  • Losses on what data? – Dave Jul 06 '20 at 12:16
  • @Dave on training data. For out-sample data, I should wait more for finishing running. – Maryam Jul 06 '20 at 12:19
  • Let us [continue this discussion in chat](https://chat.stackexchange.com/rooms/110268/discussion-between-dave-and-maryam). – Dave Jul 06 '20 at 12:21
  • 0.01 is generally too high for weight decay -- i usually use 5E-4 – shimao Jul 06 '20 at 14:17
  • @shimao I take acceptable accuracy and loss for 0.0001 but the training accuracy is 3 percent higher. – Maryam Jul 06 '20 at 14:44

1 Answers1

1

I see two issues.

  1. Accuracy is a surprisingly bad performance metric. If you evaluate your model using a so-called proper scoring rule like cross-entropy loss or Brier score, you may find that the out-of-sample performance improves even though accuracy decreases. This is because accuracy relies on a threshold, and the threshold that gives the beat accuracy might not even be $0.50$ (For binary classification).

I suggest reading a post of mine where I give an example of a great Brier score with worse accuracy than a model with mediocre Brier score: Proper scoring rule when there is a decision to make (e.g. spam vs ham email)

  1. You have to tune your regularization hyperparameter. Not every value will result in better out-of-sample performance. A typical way of tuning is cross-validation.
Dave
  • 28,473
  • 4
  • 52
  • 104
  • I printed cross-entropy loss for unseen data, It was worse than the model without weight-decay. – Maryam Jul 06 '20 at 13:17
  • How have you tuned your hyperparameter? – Dave Jul 06 '20 at 13:45
  • I tried weight-decay 0.0001, 0.001, 0.01. 0.1, 0.9. For Small amounts, accuracy and cross-entropy improved, but actually there was overfitting. Because still, the accuracy of the training data was higher than the accuracy of the validation data. For 0.1, 0.001, I see no improvement in loss and accuracy. – Maryam Jul 06 '20 at 14:18