1

I'm training an LSTM (using the Keras python library) to generate sequences. My X training data is a list of sequences, and the Y training data is a list of the final values of those sequences.

The training X (padded_training_seqs) data looks something like this (only much larger):

[
[[43.103, 27.092, 19.078], [43.496, 26.746, 19.198], [43.487, 27.363, 19.092], [44.107, 27.779, 18.487], [44.529, 27.888, 17.768]], 
[[44.538, 27.901, 17.756], [44.663, 28.073, 17.524], [44.623, 27.83, 17.401], [44.68, 28.034, 17.601], [0,0,0]],
[[47.236, 31.43, 13.905], [47.378, 31.148, 13.562], [0,0,0], [0,0,0], [0,0,0]]
]

and the training Y (training_final_steps) data looks like this:

[
[44.652, 39.649], [37.362, 54.106], [37.115, 57.66501]
]

and here is where I build the model:

in_dimension = 3
hidden_neurons = 300
out_dimension = 2

model = Sequential()
model.add(BatchNormalization(input_shape=((max_sequence_length, in_dimension))))
model.add(Masking([0,0,0], input_shape=(max_sequence_length, in_dimension)))
model.add(LSTM(hidden_neurons, activation='softmax', return_sequences=False))
model.add(Dropout(0.2))
model.add(Dense(out_dimension, activation='linear'))

model.compile(loss="mse", optimizer="sgd")
model.fit(padded_training_seqs, training_final_steps, nb_epoch=5, batch_size=1)

The problem is when I try and generate new sequences: 
```
seed_lat = 42.966
seed_long = 39.869
seed_temp = 25.066

current_generated_sequence = np.array([[[seed_lat, seed_long, seed_temp]] + [[0,0,0]] * (max_sequence_length - 1)], dtype=np.dtype(float))

for i in range(0, max_sequence_length - 1):
    next_step = model.predict(current_generated_sequence, batch_size=1, verbose=1)[0]
    current_generated_sequence[0][i + 1] = loc_with_temp(next_step, i)

I build the new sequence step by step, each time using model.predict to get the next step of the current_generated_sequence. (Then I add in a 3rd dimension to that predicted value so it can be inputted in the next iteration) Problem is, it converges on predicting a single value, so the total generated sequence looks like this:

[[[ 42.966       39.869       25.066     ]
  [ 41.38308716  37.68268204  10.387     ]
  [ 41.38308716  37.68268204  10.387     ]
  [ 41.38308716  37.68268204  10.387     ]
  [ 41.38308716  37.68268204  10.387     ]
  [ 41.38308716  37.68268204  10.387     ]
  [ 41.38308716  37.68268204  10.387     ]
  [ 41.38308716  37.68268204  10.387     ]
  [ 41.38308716  37.68268204  10.387     ]
  [ 41.38308716  37.68268204  10.387     ]]]

Anything i'm doing clearly wrong? I've done some research and read that this might be a sign of a saturated hidden layer. Or overfitting. My sample size is not exactly enormous.

UPDATE: I changed the batch size to 10 and was able to get a sequence where there was two unique values.

[[[ 41.36413574  37.32749557  10.607     ]
  [ 41.36413574  37.32749557  10.607     ]
  [ 41.36413574  37.32749557  10.607     ]
  [ 41.36413574  37.32749557  10.607     ]
  [ 41.36413574  37.32749557  10.607     ]
  [ 41.36413574  37.32749557  10.607     ]
  [ 41.36413574  37.32749557  10.607     ]
  [ 41.36413574  37.32749557  10.607     ]
  [ 41.39291382  37.15774536  10.644     ]
  [ 41.36413574  37.32749557  10.607     ]]]

P.S. The full code and data is at https://github.com/jeshaitan/migration-lstm/blob/master/main.py

jeshaitan
  • 91
  • 7
  • How about your losses after each iterations? Have you recorded them? I think you did not train your model for enough times and the loss is still too high. – fluency03 Mar 30 '16 at 14:24
  • @ChangLiu Yeah the loss values are very high, they start at 1700 and end at around 50 – jeshaitan Mar 30 '16 at 19:21
  • there may be still much space that the loss can be decreased. try further train your model until the loss is stable. it will be more helpful if you enable validation and print out validation loss. – fluency03 Mar 30 '16 at 19:24
  • @ChangLiu If my sample size is around 10, but each X value is a very long sequence, (100s of values) should I split up each X into smaller sequences to increase the sample size? – jeshaitan Mar 30 '16 at 19:36
  • how many samples you have in total? I am not sure how your data set looks like – fluency03 Mar 30 '16 at 19:41
  • @ChangLiu I have 18 samples. – jeshaitan Mar 30 '16 at 19:49
  • The data set may be too small. On the other hand, you have a LSTM layer with 200 units. I guess your model will be overfitting. – fluency03 Mar 30 '16 at 19:53
  • These are just my understanding. You should do more testing by changing different parameters on different data set. And obverse the training loss and validation loss. – fluency03 Mar 30 '16 at 19:54

0 Answers0