0

I am trying to learn a very simple sequence using an RNN (implemented in Keras)

The input sequence is randomly generated integers between 0 to 100:

x=np.random.randint(0,100, size=2000)

while the expected output value for time t is the (t-2)th input term i.e:

yt=xt-2

such that an example dataset looks like this:

+------+------+
|  X   |  Y   |
+------+------+
|    0 |   NA |
|   24 |   NA |
|   33 |    0 |
|    6 |   24 |
|   78 |   33 |
|   11 |    6 |
|    . |    . |
|    . |    . |
+------+------+

Note: I drop the NA rows before training.

I am trying to train a simple RNN to learn this sequence as below:

xtrain=np.reshape(df['X'], (df.shape[0], 1, 1))
#to match dimension of input shape for SimpleRNN layer.  

model=Sequential()
model.SimpleRNN(2, input_shape=(None,1)))
model.add(Dense(1))
model.compile(optimizer='adam', loss='mse')
model.fit(x=xtrain, y=df['Y'], epochs=200, batch_size=5)

however, I find that this implementation results in a local minima which predicts constant value(~50) for all test observations.

Could anyone help me with the right way of implementing a basic RNN in Keras to learn this sequence?

Aditya
  • 1
  • 2
  • 1
    What is the point of using a recurrent neural net if your desired output is independent of the previous observations? Your desired network that has always input 0 from the previous state. – Jan Kukacka Feb 20 '18 at 12:59
  • The point is to learn that dependency using an RNN. Let's say I feed in another input sequence that is dependent on its last 3 terms, then the RNN should learn that dependency. Similarly, if the observations are independent (as in this case), the RNN should learn that and assign 0 weight to the previous state. Or isn't that how an RNN is supposed to work? – Aditya Feb 20 '18 at 20:26
  • @JanKukacka- I updated the question with a sequence with temporal dependency, so that it makes more sense to learn with an RNN now. – Aditya Feb 22 '18 at 05:54
  • Please read about http://karpathy.github.io/2015/05/21/rnn-effectiveness/ – Germán Alfaro Feb 28 '18 at 05:09

1 Answers1

0

Try first with a dense layer, change batch size to 1, make your data long enought, try diffeent activation functions , a good rule of thump is to make your first layer 3 times the number of neurons of your features. 1F -> 3N.

Germán Alfaro
  • 241
  • 1
  • 5