Shuffle vs Non-Shuffle - Confusion matrix reacts differently

Question

Here is the config of my model :

"model": {
        "loss": "categorical_crossentropy",
        "optimizer": "adam",
        "layers": [
            {
                "type": "lstm",
                "neurons": 180,
                "input_timesteps": 15,
                "input_dim": 103,
                "return_seq": true,
                "activation": "relu"
            },
            {
                "type": "dropout",
                "rate": 0.1
            },
            {
                "type": "lstm",
                "neurons": 100,
                "activation": "relu",
                "return_seq": false
            },
            {
                "type": "dropout",
                "rate": 0.1
            },
            {
                "type": "dense",
                "neurons": 30,
                "activation": "relu"
            },
            {
                "type": "dense",
                "neurons": 3,
                "activation": "softmax"
            }
        ]
    }

Once I finished to train a model, I decided to compare what the confusion matrix looks like if I shuffle or not the dataset and the labels.

I shuffled with the line

X, label = shuffle(X, label, random_state=0)

Confusion matrix with a shuffling phase

Confusion Matrix
[[16062  1676  3594]
 [ 1760  4466  1482]
 [ 3120  1158 13456]]
Classification Report
             precision    recall  f1-score   support

   class -1       0.77      0.75      0.76     21332
    class 0       0.61      0.58      0.60      7708
    class 1       0.73      0.76      0.74     17734

avg / total       0.73      0.73      0.73     46774

Confusion matrix without a shuffling phase

Confusion Matrix
[[12357  2936  6039]
 [ 1479  4301  1927]
 [ 3316  1924 12495]]
Classification Report
             precision    recall  f1-score   support

   class -1       0.72      0.58      0.64     21332
    class 0       0.47      0.56      0.51      7707
    class 1       0.61      0.70      0.65     17735

avg / total       0.64      0.62      0.62     46774

As you can see here, the precision for both reports are significantly different. What can explain the gap between those two reports?

Please describe your procedure in words, it is unclear what and why are you doing and what is the problem. — Tim, Jan 26 '19 at 19:29

score 0 · Answer 1 · answered Jan 26 '19 at 14:12

I am not sure what kind of data you are working with, but since you are using LSTM, I assume that it is some sort of time-series/sequence data. If that is the case, shuffling can create different results when training models.

Here is an example of one potential cause. Let's assume that you have 10 years of monthly customer demand data. So a total of 10x12 samples. You setup a model to use 12 month of data and forecast the next month demand. Now, let's say that there was a new pattern that emerged in year 8, for example the customer was acquired by another company.

In this case, if you split your data into train and test, the training set may not have any data with the new pattern and thus in test time, does not perform well. However, when you shuffle the data before splitting, some of the samples from year 8 onward would be also seen in the training phase and thus in test time, the model would perform better.

As a side note, similar issue also exists when we like to apply cross fold validation to time-series data where a regular k-fold validation may result in exaggerated performance. In this case, another form of cross-validation is performed, here is an example.

Shuffle vs Non-Shuffle - Confusion matrix reacts differently

1 Answers1