RNN Regression outputting Same(ish) values

Question

I have a sequence to sequence LSTM (encoder/decoder model) that I made following this tutorial. I'm trying to output a series of human poses (in the form of 3D coordinates) with shape (N, 17, 3). I'm training my model on dance choreography (where the pose changes constantly), but the issue is that the output of my model is pretty much the same value repeated N times.

During the evaluation phase, I save the model output (shape: (batch_size, seq_len, 17, 3)), and when looking at the output afterwards it's pretty much the same pose repeated seq_len times (and it's the same throughout all the batches). When trying a new sample for inference, I get pretty much the same pose repeated but with noise (slight shifts in the coordinates).

What's confusing is that during training the loss becomes very small (SmoothL1Loss approaches < 0.016). For loss I just compare the output sequence with a sequence of poses extracted from an example dance. It would seem to me that the model is finding the "average" pose that has the least loss for the whole sequence, but I'm trying to output a series of different poses.

Is this behavior a symptom of how I'm performing training (I'm not sure which information to include so please let me know if I left out important details), or is it because the loss function isn't able to enforce that poses should change by some delta? If it's the latter, are there any tutorials/recommendations for generating a custom loss function? I'd greatly appreciate any insight!

For some additional context, here's my model architecture: Encoder and Decoder are 2 layer LSTMs with dropout layers, wrapped in a seq2seq class that calls encoder(input) and then decodes that output one step at a time. For training I'm using an SGD optimizer and SmoothL1Loss as the loss function.

They both end up being very small, with training loss less so. I've been running some more experiments and after thinking about it I'm fairly certain it's my loss function, as the model doesn't know when it's doing something wrong. If I'm able to solve it I'll update this question with the answer — ROODAY, Dec 01 '19 at 01:21

score 1 · Accepted Answer · answered Dec 03 '19 at 18:15

After thinking about what my model is training on I'm fairly certain that my strange output is to be expected. As far as I understand it, my model is essentially learning "based on the previous set of coordinates, what should the next set of coordinates be?", and since my training data are poses extracted from youtube videos (typically 24 FPS), the frame-to-frame movement between poses is very small. So it's reasonable that my model learns to output coordinates similar to the ones it receives as input, thereby generating a sequence of essentially the same pose.

To get around this, I figure I need a custom loss function that takes the movement between poses into account, something that's being explored more in this question.

I thought it's possible to use those coordinates for regression and classification of an independent variable? — Ricardo Guerreiro, Mar 19 '21 at 14:33

RNN Regression outputting Same(ish) values

1 Answers1