I've written a simple MLP in TensorFlow which is modelling a XOR-Gate.
So for:
input_data = [[0., 0.], [0., 1.], [1., 0.], [1., 1.]]
it should produce the following:
output_data = [[0.], [1.], [1.], [0.]]
The network has an input layer, a hidden layer and an output layer with 2, 5 and 1 neurons each.
Currently I have the following cross entropy:
cross_entropy = -(n_output * tf.log(output) + (1 - n_output) * tf.log(1 - output))
I've also tried this simpler alternative:
cross_entropy = tf.square(n_output - output)
alongside with some other tries.
However, no matter what my setup was, the error with a GradientDescentOptimizer
was decreasing much slower than an AdamOptimizer
.
In fact tf.train.AdamOptimizer(0.01)
produced really good results after 400-800 learning steps (in dependency of the learning rate, where 0.01
had the best results) while tf.train.GradientDescentOptimizer
always needed over 2000 learning steps no matter what cross entropy calculation or learning rate was used.
Why is this so? It seems the AdamOptimizer
is always a better choice?!