Neural net: Cost goes down but performance on train isnt!

Question

So, While building a simpel neural net (MLP) for recognizing digits, I ordered my function to print me both the mean cost overall off the train dataset and %currect answers, also over the train dataset (even though the function doing this is called val()). the cost kept going down for an hour and a half, but the %currect answers stayed the same as the beginning. the wierd thing is that while the cosr is constantly going down, the train accurasy do not change AT ALL from the very beggining.

Any ideas why this is happening?

Here is the nootbook, if needed.

https://stats.stackexchange.com/questions/312780/why-is-accuracy-not-the-best-measure-for-assessing-classification-models — Sycorax, Dec 01 '17 at 20:12

Alex R. · Answer 1 · 2017-12-01T20:04:55.247

1

It's hard to say without more detail. But most likely, if you're selecting the correct class by picking the maximal class in your softmax, then the loss can keep going down because you're increasing the correct class probability, but if another class always dominates the correct class, then the answer remains the same. For example if you have two classes with probabilities (0.2,0.8), and if the first one is supposed to be correct, then (0.3,0.7) would reduce the loss, but would not change the percent correct.

Looking at your code some more, it looks like you have three layers with 10 hidden units each. You'll most likely need at least 100 hidden units within each layer, so try making them much bigger. Check out Yann Lecun's page on MNIST accuracy here:

http://yann.lecun.com/exdb/mnist

to compare your neural network with the baselines provided

edited Dec 01 '17 at 20:04

answered Dec 01 '17 at 18:43

Alex R.

13,097
2
25
49

this is what i thought as well. any ideas on how to fix this? this is my first neural net so im kinda lost. – Moran Reznik Dec 01 '17 at 19:23
@מורןרזניק: I don't see anything obviously wrong with your code, so I suggest you take a look at Yann Lecun's MNIST accuracy page to get a sense of how good or bad you're doing: http://yann.lecun.com/exdb/mnist/ Yours looks to be a (non-convolutional) neural net, so I would see if increasing the width of your layers, or adding on more layers improves the accuracy. If you want more accuracy, you'll likely have to start using convolutional neural nets. – Alex R. Dec 01 '17 at 19:56
@מורןרזניק: See edit – Alex R. Dec 01 '17 at 20:00
I followd the stracture from here:https://www.youtube.com/watch?v=aircAruvnKk&t=480s. He got a pretty great accurasy with 2 hidden layers with 16 units each. – Moran Reznik Dec 01 '17 at 20:05
@מורןרזניק: Could you provide a reference for the accuracy (and whether the same MNIST dataset was used)? I didn't see a mention of it. – Alex R. Dec 01 '17 at 20:10
https://www.youtube.com/watch?v=IHZwWFHWa-w 13:55. I used a dataset from kagel, not sure if its the same one. I tryed running my NN with 100 units in each hidden layer and my PC crashed XD – Moran Reznik Dec 01 '17 at 20:15
@מורןרזניק: You probably don't need 100. In the video it looks like they have 16 units in the two layers. But looking at your code more: I'm not seeing any normalization of your inputs anywhere. You're using sigmoid activations in your layers, so I'd be worried if the gradients they are giving are tiny. Sanity check: run a few images through the network and check the values of your neurons. If they are really close to 0 or 1, then you have a problem. Alternatively, play around with the random initializations of your weights, by changing the mean and variance (this is more adhoc). – Alex R. Dec 01 '17 at 21:03
made 3 changes, based on your advises: normalized the input so it ranges between 0 and 1. incrcreased the number of neurons in the 2 hidden layes to 100, and made the gradient decent to 100 random examples each time. didnt help at all, as u can see: https://github.com/MoranReznik/Machine-learning-practice/blob/master/DigitRecognitionNN.ipynb – Moran Reznik Dec 01 '17 at 23:38
@מורןרזניק: Can you try subsetting your training set to just 10 images, to check if you get perfect convergence in that case? This will mostly confirm if your network has been coded correctly. – Alex R. Dec 01 '17 at 23:40
sure, 1 min and ill report back – Moran Reznik Dec 01 '17 at 23:41
Make sure you include one image from each class, to also check the loss. – Alex R. Dec 01 '17 at 23:43
did not pass the starting 0.3 accurasy rate. maybe there is a problem with the basic codde after all...but its weird since the cost IS going down. it just dosent affect the validation at all. – Moran Reznik Dec 01 '17 at 23:44
wanted to let you know that I solved the problem using the advises from this post: https://stats.stackexchange.com/questions/47590/what-are-good-initial-weights-in-a-neural-network looks like the initial wightes was the problem. thank you for all og effort to help me ! – Moran Reznik Dec 02 '17 at 08:53

Neural net: Cost goes down but performance on train isnt!

1 Answers1