Same accuracy per epoch with Keras package model for binary outcome

Question

I'm new to R and only had one course in data science. I am trying to predict a next-day stock market's direction based on technical indicators. The dataset (preprocessed in Excel, 684 predictors and includes lots of NA) is centered and scaled (first question, should it be scaled between 0 and 1?), and I split the data in y and x datasets (2nd question, should it be only 1 dataset?) and made matrices out of these. I try to run a keras model with relu and sigmoid activation (see code), but I only get an equal accuracy per epoch, how can I improve this? I left out the dropout rate so far, as it does not help and tried learning rates between .1 and .001.

hereby the code:

data = read_xlsx("C:....xlsx", sheet = 1, guess_max = 21474836, na = "")

y = data %>% select('PL not lagged')
y <- y[26:7154,]
y <- as.matrix(y)

data2 = data %>% dplyr::select(-'PL not lagged')

preproc1 <- preProcess(data2, method=c("center", "scale"))
norm1 <- predict(preproc1, data2)

x = norm1[25:7153,]
x <- as.matrix(x)

set.seed(6)
n_train = 5961

trainx = sample(1:dim(x)[1], n_train, replace=FALSE)
x_train = x[trainx,]
y_train = y[trainx,]

x_test = x[-trainx,]
y_test = y[-trainx,]


n_val = 1000
val = sample(1:dim(x_train)[1], n_val, replace=FALSE)

x_val = x_train[val,]
y_val = y_train[val]

x_train = x_train[-val,]
y_train = y_train[-val]


model <- keras_model_sequential() %>% 
  layer_dense(units = 64, activation = "relu", input_shape = (684)) %>%
  #layer_dropout(rate = .1) %>%
  layer_dense(units = 20, activation = "relu") %>%
  #layer_dropout(rate = .1) %>%
  layer_dense(units = 1, activation = "sigmoid")

optimizer = optimizer_adam(lr=0.001)

model %>% compile(
  optimizer = optimizer,
  loss = "binary_crossentropy",
  metrics = c('accuracy')
)

history1 <- model %>% fit(
  x_train, 
  y_train, 
  epochs = 50, 
  batch_size = 32,
  verbose = 1,
  validation_data = list(x_val,y_val)
)

And part of the outcome:

156/156 [==============================] - 0s 3ms/step - loss: nan - accuracy: 0.4652 - val_loss: nan - val_accuracy: 0.4960
Epoch 49/50
156/156 [==============================] - 1s 4ms/step - loss: nan - accuracy: 0.4652 - val_loss: nan - val_accuracy: 0.4960
Epoch 50/50
156/156 [==============================] - 0s 3ms/step - loss: nan - accuracy: 0.4652 - val_loss: nan - val_accuracy: 0.4960

Your loss is NAN, so at some point it stopped being able to compute the loss. You need to debug the model. — Sycorax, Jul 08 '21 at 14:37
yeah, a lot, many stocks only include partial data for the timeframe, therefore many stocks have missing data. the target data has no NA values — TimeofNow, Jul 08 '21 at 14:59
NA values among the features causes NA values in the output, which causes NA values in the loss. — Sycorax, Jul 08 '21 at 15:05
Thanks! I get a result when turning all NA to 0, will have to find out how to handle it with so many 0s, I guess — TimeofNow, Jul 08 '21 at 15:26

Same accuracy per epoch with Keras package model for binary outcome

0 Answers0