I'm training an LSTM (using the Keras python library) to generate sequences. My X training data is a list of sequences, and the Y training data is a list of the final values of those sequences.
The training X (padded_training_seqs
) data looks something like this (only much larger):
[
[[43.103, 27.092, 19.078], [43.496, 26.746, 19.198], [43.487, 27.363, 19.092], [44.107, 27.779, 18.487], [44.529, 27.888, 17.768]],
[[44.538, 27.901, 17.756], [44.663, 28.073, 17.524], [44.623, 27.83, 17.401], [44.68, 28.034, 17.601], [0,0,0]],
[[47.236, 31.43, 13.905], [47.378, 31.148, 13.562], [0,0,0], [0,0,0], [0,0,0]]
]
and the training Y (training_final_steps
) data looks like this:
[
[44.652, 39.649], [37.362, 54.106], [37.115, 57.66501]
]
and here is where I build the model:
in_dimension = 3
hidden_neurons = 300
out_dimension = 2
model = Sequential()
model.add(BatchNormalization(input_shape=((max_sequence_length, in_dimension))))
model.add(Masking([0,0,0], input_shape=(max_sequence_length, in_dimension)))
model.add(LSTM(hidden_neurons, activation='softmax', return_sequences=False))
model.add(Dropout(0.2))
model.add(Dense(out_dimension, activation='linear'))
model.compile(loss="mse", optimizer="sgd")
model.fit(padded_training_seqs, training_final_steps, nb_epoch=5, batch_size=1)
The problem is when I try and generate new sequences:
```
seed_lat = 42.966
seed_long = 39.869
seed_temp = 25.066
current_generated_sequence = np.array([[[seed_lat, seed_long, seed_temp]] + [[0,0,0]] * (max_sequence_length - 1)], dtype=np.dtype(float))
for i in range(0, max_sequence_length - 1):
next_step = model.predict(current_generated_sequence, batch_size=1, verbose=1)[0]
current_generated_sequence[0][i + 1] = loc_with_temp(next_step, i)
I build the new sequence step by step, each time using model.predict to get the next step of the current_generated_sequence
. (Then I add in a 3rd dimension to that predicted value so it can be inputted in the next iteration) Problem is, it converges on predicting a single value, so the total generated sequence looks like this:
[[[ 42.966 39.869 25.066 ]
[ 41.38308716 37.68268204 10.387 ]
[ 41.38308716 37.68268204 10.387 ]
[ 41.38308716 37.68268204 10.387 ]
[ 41.38308716 37.68268204 10.387 ]
[ 41.38308716 37.68268204 10.387 ]
[ 41.38308716 37.68268204 10.387 ]
[ 41.38308716 37.68268204 10.387 ]
[ 41.38308716 37.68268204 10.387 ]
[ 41.38308716 37.68268204 10.387 ]]]
Anything i'm doing clearly wrong? I've done some research and read that this might be a sign of a saturated hidden layer. Or overfitting. My sample size is not exactly enormous.
UPDATE: I changed the batch size to 10 and was able to get a sequence where there was two unique values.
[[[ 41.36413574 37.32749557 10.607 ]
[ 41.36413574 37.32749557 10.607 ]
[ 41.36413574 37.32749557 10.607 ]
[ 41.36413574 37.32749557 10.607 ]
[ 41.36413574 37.32749557 10.607 ]
[ 41.36413574 37.32749557 10.607 ]
[ 41.36413574 37.32749557 10.607 ]
[ 41.36413574 37.32749557 10.607 ]
[ 41.39291382 37.15774536 10.644 ]
[ 41.36413574 37.32749557 10.607 ]]]
P.S. The full code and data is at https://github.com/jeshaitan/migration-lstm/blob/master/main.py