2

I wrote a program to classify MNIST with a vanilla neural net using sigmoid activation and back-propagation training. I tried to work through the math myself (because I want to understand things ), and the formula I ended up getting was

$dE/(dW_{ab})=$ \begin{cases} 2∗O_a∗O_b∗(1−O_b)∗(O_b−exp), & \text{if node b is an output node} \\ Oa∗Ob∗(1−Ob)∗∑_L(dE/(dO_L)∗dO_L/dx∗W_{bL}), & \text{if node b is a hidden node} \end{cases}

Where $L$ is the next layer, and $∑_L$ is the sum across that layer. This looked similar to what I saw elsewhere so I assumed it was correct. Having implemented the Neural net and trying to train it over MNIST, I found that It wasn't forking at all (effectively random results). To test it, I tried unit testing pieces individually, and I found that if I only adjust weights in the final layer I had a successful classification rate of 88% after just one epoch.

So clearly there is something wrong with the way I calculate the weight adjustments for non-output layers. The only reasons I could think of this being true are that my formula is wrong or that since the expected outputs are vectors of 9 0's and a 1, the algorithm is minimizing error by just making everything zero, and just ignoring the 1. (although I don't think either of these things are very likely).

Here is the Java code of the training algorithm. I think that variable names make enough sense that this segment is understandable without the rest of the program, but if you need to see something else, just ask.

     for(int ii = 0; ii < outputLayer.size(); ii++)
     {
        Node n = outputLayer.get(ii);
        for(Connection c : n.connections)
        {
           c.origin.adjustSum += c.destination.value * (1-c.destination.value) * (c.destination.value - expected[ii])*c.weight;
           c.weight -= learningRate * c.origin.value * c.destination.value * (1-c.destination.value) * (c.destination.value - expected[ii]);
        }
     }
     for(Node n : hiddenLayer)
     {
        for(Connection c : n.connections)
        {
           c.weight -= learningRate * c.origin.value * c.destination.value * (1-c.destination.value) * n.adjustSum;
        }
     }

I am a high-school student new to stack exchange (and computer Science), so If I have done something wrong with this question just let me know in the comments and I'll try to fix it.

0 Answers0