1

I let my NN train overnight on a subset of the MNIST training data (2000 distinct digits) and the cost function for the same subset ended up converging quickly to 1.0 after about 10 gradient descents. Funnily enough, the training algorithm does seem to handle very simple inputs. A training set of just one digit with input of a completely blank screen ends up converging to a final cost <10^-10 on that same training example, however when having over two hidden layers in the network it guesses with almost total confidence a few different digits, with the other ones being almost zero. However again, when also feeding in input of the digits when training i have never made it work for any network of any size, with any training set of any size.

I feel like the error in the code is a needle in a haystack, but perhaps this is a common isssue where a lot of newbies go wrong and you could point me in the right direction. I'm willing to provide any details you'd like, and any help is appreciated. Thanks in advance.

0 Answers0