Questions tagged [seq2seq]

40 questions
4
votes
2 answers

How to make a seq2seq model work with infinite vocabulary?

I have trained a translation seq2seq model. In my model, I have kept vocabulary size to 100,000. This constraint limits my model from generating any words which are not in this 100,000. So how does Google Translate or Bing Translate works for any…
3
votes
0 answers

Canonical LSTM backpropagation equations

I'm trying to understand the underlying mechanisms of LSTM from a programming perspective. I am no math person, and a lot of articles and papers look like alphabet soup to me. But I thought that if I can translate the process to a programming…
Jim
  • 31
  • 3
3
votes
1 answer

"Attention is all you need" input scaling explanation

I would like to ask about the last sentence here from paper https://arxiv.org/abs/1706.03762: 3.4 Embeddings and Softmax Similarly to other sequence transduction models, we use learned embeddings to convert the input tokens and output tokens to…
Marek Židek
  • 131
  • 4
3
votes
1 answer

In Sequence to Sequence learning, how can large amounts of missing/special words in a sentence be compensated for?

I'm currently working on a Seq2Seq model for a chatbot and I'm converting every sentence to numerical vectors with word embeddings, i.e. GloVe. My problem is that training doesn't progress; the model starts with around 0.0055 loss with mean-squared…
2
votes
0 answers

What is the Encoder in a seq2seq RNN doing?

I am learning about RNNs especially seq2seq Models that are using LSTM. I am wondering what exactly the Encoder in such models is doing. To ensure that I've understood the rest of the seq2seq-Model using LSTM right, I enumerate what I think it…
2
votes
1 answer

Does it make sense to use attention mechanism for seq-2-seq autoencoder for anomaly detection?

So I want to train LSTM sequence to sequence model, autoencoder, for anomaly detection. The idea is to train it on normal samples and when anomaly comes into model it will not be able to reconstruct it correctly and will have high reconstruction…
pikachu
  • 731
  • 2
  • 10
2
votes
1 answer

How exactly does conv1d filter work when operating on a sequence of characters?

I understand convolution filters when applied to an image (e.g. an 224x224 image with 3 in-channels transformed by 56 total filters of 5x5 conv to a 224x224 image with 56 out-channels). The key is that there are 56 different filters each with 5x5x3…
Joe Black
  • 299
  • 1
  • 10
2
votes
1 answer

RNN Regression outputting Same(ish) values

I have a sequence to sequence LSTM (encoder/decoder model) that I made following this tutorial. I'm trying to output a series of human poses (in the form of 3D coordinates) with shape (N, 17, 3). I'm training my model on dance choreography (where…
ROODAY
  • 151
  • 1
  • 6
2
votes
0 answers

BERT for non-textual sequence data

I'm working on a deep learning solution for classifying sequence data that isn't raw text but rather entities (which have already been extracted from the text). I am currently using word2vec-style embeddings to feed the entities to a CNN, but I was…
2
votes
0 answers

Sequence to sequence with real number features?

This is probably related to this question: How to make a seq2seq model work with infinite vocabulary? I have read many seq2seq implementations and they all seem to only work on fixed, well-discretized vocabulary which the main application of seq2seq…
xxbidiao
  • 187
  • 1
  • 1
  • 8
2
votes
1 answer

In seq2seq, how is the attention vector combined with the hidden state of the decoder?

My understanding of attention is that a weighted combination of a set of vectors is somehow combined with the decoder's hidden state. How exactly is it combined? Is it added to the hidden state before it enters the cell at each time step? In…
1
vote
0 answers

How can one feed all of the input to an RNN, and then get all of the output from it?

When reading papers, a common concept is delaying the output of RNNs to after seeing all of the input. E.g., the neural Turing machine paper uses this technique, together with a simple identity function on the input sequence, to gauge how long-term…
1
vote
0 answers

How to use LSTM for time series with the output as a sequence and not the future values

Currently, I have data of batteries and this data is recorded every second for different dates. So, I have data for 10 dates and the output that I need to predict is Zimg and Zreal, which are two features to plot the Nyquist curve for the batteries…
1
vote
0 answers

Levenshtein/Edit Distance as a loss function for sequence transformer models?

Often, the loss function used for a sequence is cross entropy loss between $y_{true}$ and $y_{pred}$ where both are of size $SeqLength \times NumClasses$. When $y_{pred}=y_{true}$ we get the lowest loss, however if we shift the values of $y_{pred}$…
1
vote
0 answers

Training a Discriminator to guide Beam Search for a seq2seq model?

The idea is to train a discriminator during training of the seq2seq model to differentiate between 'fake' decoder outputs and 'real' decoder targets, while not propagating discriminator loss to the seq2seq model. Then during inference the…
Avelina
  • 809
  • 1
  • 12
1
2 3