Which LSTM output should be used for predictions?

Question

Using this question as background: https://stackoverflow.com/questions/71023822/lstm-multi-variate-multi-feature-in-pytorch

I was wondering how one processes the output of a pytorch LSTM

I was using this as a reference: https://pytorch.org/tutorials/beginner/introyt/trainingyt.html

And looking at the loss function, and realized perhaps I've missed something. I thought the process was:

train set: input, label
test set: input   response: label

Where the LSTM performs matrix multiplications to get as close to "1" for the label i've presented it with, and as close to "0" for all other possible outputs, adjusts its internal weights as needed to make this true, and continues to adjust its weights as new training inputs are presented

I then thought that when presented with a test input, the LSTM would return the predicted label for that observation, however I have been told this is incorrect; That what I'll get back is a vector of the same size and shape as what came in.

I also thought that the loss function was a measurement of how far we are by some distance metric from modeling the training set accurately.

Question: Given the dataset I have, how would one take the hidden layer's output, match it with a label (during training)?

(i've been advised this involves the loss function, so I suppose this question would involve the default one for pytorch, which seems to be torch.nn.CrossEntropyLoss())

Question 2: How do i get back a label from the trained LSTM when i present it with a new test input?

Thank you

Asking for code isn't an on-topic question here, but we have a number of questions about many-to-one LSTMs that can be found with a search. Here's one to get you started: https://stats.stackexchange.com/search?q=many-to-one+lstm+score%3A1+answers%3A1 — Sycorax, Feb 07 '22 at 21:18
Alright, i'll edit my question then. It's the process i'm interested in — corp, Feb 07 '22 at 21:19

Sycorax · Accepted Answer · 2022-02-07T22:39:41.333

1

Your input has shape (channels, time). The output has shape (out, time), such that the output vectors are arranged sequentially in time. You haven't told us how your features relate to your label, but typically the prediction at the last time-step is the one that you use as the prediction of the label (e.g. predict the last word of a sentence given some, possibly all, previous words). This is the same for training and testing.

A useful outline of different LSTM models can be found in Andrej's Karpathy's blog post "The Unreasonable Effectiveness of Recurrent Neural Networks."

Presumably you're using something like a sigmoid or softmax activation in the final layer to give a vector of probabilities. A predicted probability is not a label, but it does tell you about the model's estimate of the probability of each label given the input. If you truly need to dichotomize your predictions, then you'll need to use some appropriate rule to relate the probabilities to the outcomes, and ideally this rule will be informed by the relative costs of the different kinds of error associated (FN, FP). Some more elaboration:

Reduce Classification Probability Threshold

edited Feb 07 '22 at 22:39

answered Feb 07 '22 at 21:35

Sycorax

76,417
20
189
313

My data is a set of observations consisting of: [(time1 feature1, feature2, feature3, feature4, time2 feature1, feature2, feature3, feature4, time3 feature1, feature2, feature3, feature4, time4 feature1, feature2, feature3, feature4), label]. The features are all floats of various ranges, the label is an integer. My question is purely, from the pytorch perspective, how does one get a label back, when i'm in the testing phase, in order to provide that label (correct or incorrect) to the loss function. – corp Feb 07 '22 at 22:08
This description doesn't tell us how the features and time steps are related to the label. In any event, you'll have to ask a code site to get answers to coding questions. – Sycorax Feb 07 '22 at 22:14
I'm not entirely sure what you're asking. The features have a non-linear relationship with the label. The features describe a series of steps taken by a driver of a vehicle, and the label classifies the resulting error (or lack thereof). I'm afraid I can't really give more detail than that. But i'm not interested in the code perspective. I'm interested in understanding what an LSTM's output is. In the output layer, do I just get the label back, then use a loss function to show the distance from the true label and the label predicted by the output? – corp Feb 08 '22 at 19:12
Every loss function is a comparison between the model’s prediction and the label. What is unclear about the LSTM output in particular? – Sycorax Feb 08 '22 at 19:14
In the past, when i've done ML, one provides the label alongside the training data. In this case however, we provide an input to the LSTM, who after passing it through its layers, provides an output. But then we have to "grade" the output with a loss function, and back-propagate this loss function's grade in order to modify the weights. I guess based on past experiences, i was expecting the LSTM to directly provide a prediction after training, but it seems there's just an extra step in training, and otherwise the process is as i've described. is that correct? – corp Feb 08 '22 at 19:24
LSTMs still provide a prediction, but for every time step (although some software has the option to only return the final time-step). If the label belongs to the last time-step, you just need to select that time-step and then compute the loss in the usual way. Pretty much every NN uses backprop to update the weights, and most ML models give predicted probabilities as results for classification problems, so it's hard to understand what contrast you're finding. – Sycorax Feb 08 '22 at 19:26
That's alright, my background is not in neural networks, but in other types of classic, statistical AI. I think you've answered my question, so i'm going to mark this as answered. Thank you for your time. – corp Feb 08 '22 at 19:45

Which LSTM output should be used for predictions?

1 Answers1