2

I am learning about RNNs especially seq2seq Models that are using LSTM. I am wondering what exactly the Encoder in such models is doing. To ensure that I've understood the rest of the seq2seq-Model using LSTM right, I enumerate what I think it is:

Encoder:

  • Gets Input
  • Outputs hidden state

Decoder:

  • Gets hidden state from Decoder
  • using hidden state, cell state, previous prediction / previous label (test / train for Teacher Forcing), (and another input, I dont know what - x(t) in the picture below) to make a prediction by doing them into an LSTM-Cell, in which the hidden state and cell state is changed, and a prediction is made. This is performed, until the end of the sentence is predicted.

enter image description here

If I understood this correctly, this LSTM cell is only existent in the Decoder of a seq2seq model that uses LSTM. So that is what inside the Decoder is happening - predicting.

But what happens to the input inside the Encoder? Why is Input coming in and a hidden state coming out? Is there anything else coming out, too? Are the inputs just somehow transformed and added to the hidden state? Are they even used inside it?

Tknoobs
  • 23
  • 5
  • 1
    Encoder is not a part of LSTM. You can have encoder part of neural network that uses LSTM. Could you clarify what exactly you mean? – Tim Mar 06 '21 at 12:24
  • @tknoobs I've reviewed the edit, but it's not clear which part of the network you're calling the "encoder." Can you enumerate the network architecture that you're thinking of, and then label the part that you're calling the "encoder" – Sycorax Mar 06 '21 at 16:28
  • Is some part of the network an LSTM? Which part? Is some part of the network a dense layer? Which part? Is some part of the network an embedding layer? Which part? What is the sequence of layers applied to the data to create an output? Which one of these layers is the "encoder"? – Sycorax Mar 06 '21 at 18:14
  • you made me clear, that there is a lot more, that I do not understand or things I am not sure about these seq2seq models, shell I include it in the question? – Tknoobs Mar 06 '21 at 18:48
  • "seq2seq" just describes a model that takes a sequence as an input, and yields another sequence as an output. There's a lot of ways to make a model that does this. The way the question is written suggests that you have a *specific model* in mind, but you haven't told us what that model is in specific terms. I'm trying to ask questions that will encourage you to describe the model in more detail. – Sycorax Mar 06 '21 at 19:43
  • I made a Codecademy Course, where they said: "An encoder that accepts language (or audio or video) input. The output matrix of the encoder is discarded, but its state is preserved as a vector." It was for making a little spanish-englisch translator. – Tknoobs Mar 06 '21 at 22:19
  • @Tknoobs h(t) is the hidden state, basically the "output" of a LSTM or any other RNN. Additionally to that, the LSTM has c(t), a "long-memory-like" state. When they say they discard outputs, they probably meant h(t). Depending on framework, however, that wouldn't mean h(t) is not stored and reused, but simply that the time-distributed outputs of the encoder are not used. – Firebug Mar 06 '21 at 23:34
  • @Firebug Ok... so the Encoder produces both c(t) and h(t), right? And if so, how? I mean in the encoder the model above produces outputs that can be compared to the labels, but in the Encoder there are no labels, so what is added to c(t) and h(t)? – Tknoobs Mar 07 '21 at 09:55
  • @Tknoobs c(t) and h(t) are activations, just like any other. They are obtained by matrix multiplications, elementary operations and activation functions. It is simply what it is. The encoder only contributes the last activations (c(t)) to the decoder, that vector must encode as much information about the sequence as possible, because the decoder will have only that to work from. In the encoder, h(t) and c(t) are not compared with anything. – Firebug Mar 07 '21 at 18:11

0 Answers0