0

I'm training an LSTM that has a time series as input, and outputs a classification of 'a','b', 'null', 'd', 'e'. In the data, over 78% of the y-labels are 'null', so the LSTM is quite good at picking 'null' with nearly perfect recall and precision. But it's essentially useless at picking out the other labels.

Thus, my LSTM, as-is, is not any better than a robot that always chooses 'null', regardless of the input.

On the one hand, it's entirely possible that the data lacks discriminatory patterns that allow the LSTM to identify non-null outcomes. For the purposes of this question, please ignore that possibility. Setting that aside, how should one go about accounting for outputs that are not well-distributed? Do you force that 'spike' into a number of sub-classifications, to artificially smooth the distribution?

Is there a solution in changing the loss function, such that it weights non-null results more heavily than null results? Or that it penalizes null results?

Does it make any sense to remove all null results from the dataset, train an algo on that subset, then train a downstream network to pick out the nulls from the entire dataset?

In case it is helpful, I've included the LSTM model and some output.

Layer (type) Output Shape Param
lstm_1 (LSTM) (None, 1000) 4052000
dense_1 (Dense) (None, 5) 5005


Total params: 4,057,005
Trainable params: 4,057,005 Non-trainable params: 0
Loss Function: Categorical CrossEntropy
Optimizer: ADAM
Dense activation: Softmax

         precision    recall  f1-score   support

      0       0.69      0.52      0.60      4356
      1       0.29      0.00      0.00      4472
      2       0.82      0.98      0.89     64371
      3       0.76      0.30      0.43      4666
      4       0.00      0.00      0.00      4152

avg / total 0.74 0.81 0.76 82017


I don't claim that my LSTM structure is the best. I've tried several, but I think I need to address this output distribution before settling on a structure.

K_foxer9
  • 101
  • 2
  • 1
    Have a look at other threads discussing class imbalance, such as: https://stats.stackexchange.com/questions/131255/class-imbalance-in-supervised-machine-learning – Jan Kukacka May 08 '18 at 15:47
  • 1
    What is your true loss function? Is it worse to predict null and for something else to come up than to predict something else when the result is null? – Björn May 08 '18 at 16:00
  • @Björn, I've added details about the loss function to my post (categorical crossentropy). Regarding predictions, a false positive (null detected as not-null) is worse than a missed non-null. Thus, generally, I prioritize precision over recall. – K_foxer9 May 08 '18 at 16:16
  • In that case categorical cross entropy is probably not what you want (at least not without tweaking of the data - such as over - or undersampling). – Björn May 08 '18 at 16:17

0 Answers0