Loss and dropout in deep learning

Question

I have a CNN with 3 convolutional layers, 1 max-pooling layer and 2 fully-connected layers before applying softmax classification. The CNN is trained with Adagrad and I achieve a quite good performance. However, I'm curious as to why my loss is incredibly very stochastic (see below). Throughout the 30000 iterations it will actually jump above the initial loss. Despite that, the accuracy of the CNN is pretty consistent throughout training. Could this be due to the use of dropout on the convolutional layers and fully-connected layers? If so, where is this mentioned in any scientific articles, lecture notes or tutorials?

Edit: Added parameters below. The learning rate and weight decay is found using gridsearch and different values doesn't change the loss much. It might be that I haven't tuned them completely correct though. Actually, I'm surprised how sensitive Adagrad is to these variables (increasing LR/WD by a factor 10 from these values actually causes my learning to diverge). I'll look into adding a validation/test loss when I have time.

Learning Rate: 0.003
Weight Decay: 0.0005
Dropout: 0.5
Minibatch-size: 10

enter image description here

In order to check your model, I would suggest if you could you print the validation or test loss too. Make sure that when you evaluate both of them you turn off dropout. Dropout should only be used in training. If you are using Torch there is an option `model:evaluate()` and `model:training()` to switch between modes. What is your dropout ratio? Although it doesn't really affect adagrad, but have you tried changing the initial learning rate (e.g. 1e-1, 1e-2, 1e-3)? Finally, what is your mini-batch size? — Yannis Assael, May 29 '15 at 15:03
Thanks. I'll try to print the test/validation loss. However, it'll require a lot of recoding so it would be great if you could give any hints based on the parameters in my updated OP. — pir, May 31 '15 at 19:01
I'm more than happy to help, but we will need more information on the questions I asked you above, in order to come to suggestions. — Yannis Assael, May 31 '15 at 19:39

Loss and dropout in deep learning

0 Answers0