Forum,
I have a multivariate time series problem; For my master thesis I am investigating whether it is possible to forecast the movement direction of stock price with machine learning. My model looks as follows:
def sentdex_model(X_train):
model = Sequential()
model.add(LSTM(33, input_shape=(X_train.shape[1:]), return_sequences=True))
model.add(LSTM(33, input_shape=(X_train.shape[1:])))
model.add(Dense(90, activation='relu'))
model.add(Dense(1, activation='sigmoid'))
model.compile(loss='binary_crossentropy', optimizer='rmsprop', metrics=['accuracy'])
# model.summary()
return model
The input data is in the form of [#samples, timesteps, features]. The features are OHLCV (Open-High-Low-Close-Volume) data of 6 different telecom companies. Im trying to predict whether a stock will rise (1) or fall (0). So it is basically a time series classification problem. I've always learned that, when data is presented on very different intervals - it is good practice to MinMaxScale the input data into the neural network.
However, when I do so, the train accuracy of the model keeps hovering around the baseline of 0.50 (There is an equal amount of 1s (price rise) and 0s (price fall). So, the model is not really learning. Now, when i dont MinMaxScale, accuracy slowly climbs to around 75% over 50 epochs.
Can anyone explain why the model without MinMaxScaling seems to learn better than the model with MinMaxScaling?