How to interpret that my model gives no negative class prediction on test set?

Question

I am making multivariate time series classification with TUH seizure corpus dataset

I have built this model with Keras, using LSTM layers :

model = Sequential()
model.add(LSTM(50, return_sequences=True,input_shape=(look_back, trainX.shape[2])))
model.add(LSTM(50))
model.add(Dense(1, activation='sigmoid'))
model.compile(loss='binary_crossentropy', optimizer='adam',metrics=['accuracy'])
model.fit(trainX, trainY,validation_split=0.3, epochs=50, batch_size=1000, verbose=1)

and the results are surprising... When I compute the confusion_matrix like this :

trainPredict = model.predict(trainX)
testPredict = model.predict(testX)
print(confusion_matrix(trainY, trainPredict.round()))
print(confusion_matrix(testY, testPredict.round()))

I respectively get :

[[129261      0]
[   172 129138]]

and

[[10822     0]
[10871     0]]

In other words, My training confusion matrix is quite fine while my testing confusion matrix classifies everybody as "positive". What is surprising is that I have quite perfectly balanced instances, both in training and testing set...

Why do I have this ?

EDIT :

My "preprocessing code", based on Jason Brownlee's tutorial looks like that : I "reshape" the data so that each data point is composed by the look_backprevious measurements before the target to be predicted, each of measurement actually consisting in 22 signals corresponding to EEG channels

def create_dataset(feat,targ, look_back=1):
   dataX, dataY = [], []
   print (len(targ)-look_back-1)
   for i in range(len(targ)-look_back-1):
       a = feat[i:(i+look_back), :]
       dataX.append(a)
       dataY.append(targ.iloc[i + look_back])
   return np.array(dataX), np.array(dataY)

and then

look_back = 50
trainX, trainY = create_dataset(X_train_resampled,Y_train_resampled, look_back)
print ("loopback1 done")
testX, testY = create_dataset(X_test_resampled,Y_test_resampled, look_back)

I have dimension (#recordings, 50 (look_back), #features (22)) for trainX and testX

I am not sure about if this way of working is adequate ? Maybe it's the cause of the error

Thanks

EDIT : even, when I split properly the data in a training, validation and test set before applying the model, like this :

validX, validY = create_dataset(X_valid_resampled,Y_valid_resampled, look_back)

I still get poor results : a confusion matrix with far too much false positives and very little (true) negatives.

My doubts : should I increase my time_steps window (i.e. look_back parameter) ? Btw, is there a fine method to tune properly this parameter based on the context ?

Maybe the usage of create_dataset is not appropriate (although it comes from a famous tuto) ? Indeed, it appears to me that it brings redudancy and thus correlation between sequences...

I hope someone could help

Seems like a perfect case of overfitting. How does the validation score look like? — Jan Kukacka, Aug 17 '18 at 13:43
This is extremely closely related: https://stats.stackexchange.com/questions/312780/why-is-accuracy-not-the-best-measure-for-assessing-classification-models — Sycorax, Aug 17 '18 at 16:43
Yes, it's time series classification. I have several features. I will edit my post with the "preprocessing steps". @DeltaIV — MysteryGuy, Aug 18 '18 at 14:20
@DeltaIV Do you have any great tutorial or example about the use of LSTM for multivariate time series classification ? I will edit asap — MysteryGuy, Aug 18 '18 at 19:34
@DeltaIV could you please detail why the validation set is not well defined ? — MysteryGuy, Aug 18 '18 at 19:36
@DeltaIV `feat` represents the feature vector (22 features) while `targ` is the target vector (binary; 0 and 1's). The dataset represents `TUH EEG seizure corpus`signals, sampled at 50Hz on 22 EEG channels (each future is one channel). Would you need more information ? Actually, I don't think there is sth really special about the raw data; however, I am not entirely confident about the way I reshaped it for the LSTM and about the way I built and trained the network (on that part, the code is given above) — MysteryGuy, Aug 18 '18 at 19:47
Let us [continue this discussion in chat](https://chat.stackexchange.com/rooms/81882/discussion-between-deltaiv-and-mysteryguy). — DeltaIV, Aug 18 '18 at 20:43

DeltaIV · Answer 1 · 2018-08-27T16:15:09.113

Did you use sequentially ordered training, validation and test sets? The use of the terminology X_valid_resampled seems to suggest that at some point in the generation of your validation set, you reshuffled your dataset. Time series data are not i.i.d., they are serially correlated - if you don’t respect this property when performing reshuffling, you will end up with useless results. In other words, you’re only allowed to reshuffle the order in which you feed to your NN the various time series (individuals), but, in theory, inside each time series you shouldn't reshuffle the order of the time samples. The reason is that theoretically LSTMs should be able to learn long-term recurrencies, i.e., by storing the memory of the subsequences they've already seen in the hidden state, they should be able to learn and predict arbitrarily long patterns, longer than the length of a subsequence you feed them. In order for LSTMs to learn such patterns, of course during training the state of a LSTM must not be initialized to 0 between two batches: Keras calls these LSTMs stateful.

In practice, multiple experiments have shown that LSTMs are often not able to learn long-term (theoretically even infinite) recurrences. For this reason, often the subsequences are reshuffled among different training batches, and the state of the LSTM is initialized to 0 after each batch (stateless LSTMs). However, each subsequences is still made of successive time samples.

Also, the subsequences used for the training set must precede in time those used for the validation set, and those used for the validation set must in turn precede in time those used for the test set, otherwise we would be leaking information from the training to the validation/test set.

Other than this, it’s hard to answer your question without seeing exactly which preprocessing steps you followed. Have a look at this question

How to apply Neural Network to time series forecasting?

Even though it’s about time series regression, rather than classification, still it explains quite well how to preprocess your data set in order to successfully train your RNN. Also, since you mention a tutorial in your question, I think this tutorial more clearly describes the process of generating the subsequences needed for training, validation and testing.

Finally, it may be that you already preprocess your data correctly. In this case, a possible explanation for your crappy test results may be overfitting. As a matter of fact, a clear issue with your architecture is that you haven’t used any regularization, except that implicitly provided by the stochastic optimizer. You need appropriate regularization methods for recurrent architectures: look into dropout and recurrent dropout.

What about the choice of the `look_back`parameter ? I guess it depends a lot on the context... Is there a way to choose it in a safety way ? — MysteryGuy, Aug 27 '18 at 13:24
could you please elaborate on this : "Also, the subsequences used for the training set must precede in time those used for the validation set, and those used for the validation set must in turn precede in time those used for the test set". I don't understand why it is so important, at least in `stateless` mode... — MysteryGuy, Aug 27 '18 at 13:48
But the training data can be shuffled between them, right ? What about cross-validation, thus ? — MysteryGuy, Aug 27 '18 at 14:30

How to interpret that my model gives no negative class prediction on test set?

1 Answers1

Linked