I am trying my first LSTM with keras to classify time-dependent data sets.
I have created a training and a testing data sets, which I have normalized:
# Compute the mean and standard deviation for each feature of the training set
train_data <- my.matrix[1:end.train,]
mean_features <- apply(train_data, 2, mean, na.rm = TRUE)
std <- apply(train_data, 2, sd, na.rm = TRUE)
# Scaling the whole data set using the mean and sd of the training set
my.matrix <- scale(my.matrix, center = mean_features, scale = std)
The categories are one-hot encoded and the final matrix is of dimension [84000, 10, 22], meaning I have 84000 observations with a rolling window (data.window) of ten observations and 22 features. My batch size is 500 but I tried multiple values.
Then I create the following LSTM:
model <- keras_model_sequential()
model %>%
layer_lstm(units = 50,
input_shape = c(data.window, dim(my.matrix)[3]),
batch_size = batch.size,
return_sequences = TRUE,
recurrent_dropout = 0.2,
stateful = TRUE) %>%
layer_dropout(rate = 0.2) %>%
layer_lstm(units = 50,
recurrent_dropout = 0.2,
return_sequences = FALSE,
stateful = TRUE) %>%
layer_dropout(rate = 0.2) %>%
layer_dense(units = 3, activation = 'sigmoid')
optimizer <- optimizer_sgd(lr = 0.1)
model %>%
compile(
loss = 'categorical_crossentropy',
optimizer = optimizer,
metrics = c('accuracy')
)
model
Here is the model summary:
Model
Model: "sequential_4"
________________________________________________________________________________________
Layer (type) Output Shape Param #
========================================================================================
lstm_4 (LSTM) (500, 10, 50) 14600
________________________________________________________________________________________
dropout_4 (Dropout) (500, 10, 50) 0
________________________________________________________________________________________
lstm_5 (LSTM) (500, 50) 20200
________________________________________________________________________________________
dropout_5 (Dropout) (500, 50) 0
________________________________________________________________________________________
dense_4 (Dense) (500, 3) 153
========================================================================================
Total params: 34,953
Trainable params: 34,953
Non-trainable params: 0
________________________________________________________________________________________
Launching the training is disappointing, as the loss drops very quickly and both the accuracy and loss become constant.
for(i in 1:2000){
print(paste("Training epoch:", i))
model %>% fit(x = train_data,
y = train_labels,
batch_size = batch.size,
epochs = 1,
verbose = 1,
shuffle = FALSE)
model %>% reset_states()
}
This gives:
[1] "Training epoch: 1"
168/168 [==============================] - 4s 25ms/step - loss: 0.2838 - accuracy: 0.2484
[1] "Training epoch: 2"
168/168 [==============================] - 4s 27ms/step - loss: 1.1921e-07 - accuracy: 0.2020
[1] "Training epoch: 3"
168/168 [==============================] - 4s 23ms/step - loss: 1.1921e-07 - accuracy: 0.2020
[1] "Training epoch: 4"
168/168 [==============================] - 4s 25ms/step - loss: 1.1921e-07 - accuracy: 0.2020
[1] "Training epoch: 5"
168/168 [==============================] - 4s 23ms/step - loss: 1.1921e-07 - accuracy: 0.2020
[1] "Training epoch: 6"
168/168 [==============================] - 4s 25ms/step - loss: 1.1921e-07 - accuracy: 0.2020
[1] "Training epoch: 7"
168/168 [==============================] - 4s 27ms/step - loss: 1.1921e-07 - accuracy: 0.2020
[1] "Training epoch: 8"
168/168 [==============================] - 4s 25ms/step - loss: 1.1921e-07 - accuracy: 0.2020
This goes on with no notable change.
I tried playing with the optimizer, the learning rate, the dropout, the recurrent dropout, etc.
Up to now, nothing ever changed the fact that from epoch 2, my loss is 10⁻7 and the accuracy does not move significantly.
This really is a pet project I'm doing to learn deep learning, but I end up trying random things without understanding them, it's frustrating.
I don't ask for you to solve the issue I'm having, more to give me pointers to where I could better understand this behaviour.
Edit: As suggested in the comments, I'll clarify. My problem is actually not that the loss is small. The problem is that the neural network does not seem to learn and I thought it was due to the small loss. If the loss is small, it means I have reached a minimum (at least a local one), which I thought would explain why my model was not learning anymore.
The problem I'm trying to "solve" is to predict if the price of a stock is going to increase or decrease (or stay range bound). I know this is a difficult problem and that there is no solution to it. But there are plenty of blog posts experimenting on the subject so I used them as an introduction to deep learning. Once more, I'm not trying to solve this, I'm trying to learn by example.