Accuracy of RNN getting stuck after 90%

Question

I am using Keras RNN Cell to perform parts of speech tagging. The architecture is as follows(I cannot put the code because of privacy reasons) :

An embedding layer of of 40 units of shape (batch_Size, max_sentence_length, 40)
tf.keras.layers.SimpleRNNCell(state_size=number_of_tags_in_dataset+20, dropout=0.2,recurrent_dropout=0.0, activation='tanh')
tf.contrib.layers.fully_connected(units=state_size)
tf.contrib.layers.dropout(keep_prob=0.6)
tf.contrib.layers.fully_connected(units=number_of_tags)
tf.keras.layers.BatchNormalization()
tf.keras.activations.relu()
Using tf.contrib.seq2seq.sequence_loss() with AdamOptimizer and gradient clipping of 0.5
batch_size=32, learn_rate=0.01

The result for ~14 epochs are as follows (due to resource constraint, thats the maximum number of epochs I can run for)

accuracy 0.0172309
accuracy 0.800888
accuracy 0.866243
accuracy 0.893743
accuracy 0.896006
accuracy 0.901575
accuracy 0.899487
accuracy 0.898529
accuracy 0.900531
accuracy 0.902532
accuracy 0.899051
accuracy 0.903055
accuracy 0.901053
accuracy 0.898703
accuracy 0.898703

I have noticed this trend where no matter what hyperparameter changes I make, the accuracy is getting stuck around 89-90%. Can you provide some suggestions to boost the accuracy? I am fairly new to Deep Learning so I have been struggling a lot to optimize my model. I have also tried building Bidirectional LSTMs for the same but they are too slow for the resource constraint and the maximum accuracy that I can achieve there is around 92%.

Why do you say stuck? It looks like it is converging to 90%. Why would you expect to be able to get higher accuracy than that? — Michael R. Chernick, Apr 28 '19 at 23:35
I say so because I have noticed some of my peers getting an accuracy of around 95% on the same dataset, with a similar architecture. So, I feel there are some shortcomings with my model or hyperparameters. I am fairly new to Deep Learning and I have been struggling a lot to optimize my model. I have also tried building Bidirectional LSTMs for the same but they are too slow for the resource constraint. — Reshu Bisht, Apr 28 '19 at 23:41
1. don't use batch norm before or after the last layer 2. do you have a good reason to use relu as an output activation? — shimao, Apr 29 '19 at 00:17
At or around which layer can I benefit from batch norm? All my layers have Tanh activation expect the last one. Putting tanh in place of relu at the end, was reducing the accuracy. — Reshu Bisht, Apr 29 '19 at 00:21
removing batch norm from the very end and trying to place it somewhere before the first fully connected layer, etc. is reducing the accuracy. Also, relu is performing the best(have tried softmax, tanh,etc.). Any other ideas ? — Reshu Bisht, Apr 29 '19 at 03:36

Accuracy of RNN getting stuck after 90%

0 Answers0