Highest Voted 'seq2seq' Questions - Statistical Analysis Stack Exchange

4

votes

2 answers

How to make a seq2seq model work with infinite vocabulary?

I have trained a translation seq2seq model. In my model, I have kept vocabulary size to 100,000. This constraint limits my model from generating any words which are not in this 100,000. So how does Google Translate or Bing Translate works for any…

asked Jan 22 '18 at 17:09

pseudo_teetotaler

253
1
8

3

votes

0 answers

Canonical LSTM backpropagation equations

I'm trying to understand the underlying mechanisms of LSTM from a programming perspective. I am no math person, and a lot of articles and papers look like alphabet soup to me. But I thought that if I can translate the process to a programming…

machine-learning lstm derivative seq2seq

asked Dec 17 '20 at 16:32

Jim

31
3

3

votes

1 answer

"Attention is all you need" input scaling explanation

I would like to ask about the last sentence here from paper https://arxiv.org/abs/1706.03762: 3.4 Embeddings and Softmax Similarly to other sequence transduction models, we use learned embeddings to convert the input tokens and output tokens to…

deep-learning seq2seq

asked Apr 05 '19 at 18:08

Marek Židek

131
4

3

votes

1 answer

In Sequence to Sequence learning, how can large amounts of missing/special words in a sentence be compensated for?

I'm currently working on a Seq2Seq model for a chatbot and I'm converting every sentence to numerical vectors with word embeddings, i.e. GloVe. My problem is that training doesn't progress; the model starts with around 0.0055 loss with mean-squared…

machine-learning natural-language word-embeddings seq2seq

asked Feb 08 '18 at 16:05

narutatsuri

232
2
8

2

votes

0 answers

What is the Encoder in a seq2seq RNN doing?

I am learning about RNNs especially seq2seq Models that are using LSTM. I am wondering what exactly the Encoder in such models is doing. To ensure that I've understood the rest of the seq2seq-Model using LSTM right, I enumerate what I think it…

neural-networks lstm recurrent-neural-network seq2seq

asked Mar 06 '21 at 11:13

Tknoobs

23
5

2

votes

1 answer

Does it make sense to use attention mechanism for seq-2-seq autoencoder for anomaly detection?

So I want to train LSTM sequence to sequence model, autoencoder, for anomaly detection. The idea is to train it on normal samples and when anomaly comes into model it will not be able to reconstruct it correctly and will have high reconstruction…

lstm autoencoders anomaly-detection attention seq2seq

asked Nov 12 '20 at 13:42

pikachu

731
2
10

2

votes

1 answer

How exactly does conv1d filter work when operating on a sequence of characters?

I understand convolution filters when applied to an image (e.g. an 224x224 image with 3 in-channels transformed by 56 total filters of 5x5 conv to a 224x224 image with 56 out-channels). The key is that there are 56 different filters each with 5x5x3…

lstm recurrent-neural-network convolution seq2seq torch

asked May 28 '20 at 01:49

Joe Black

299
1
10

2

votes

1 answer

RNN Regression outputting Same(ish) values

I have a sequence to sequence LSTM (encoder/decoder model) that I made following this tutorial. I'm trying to output a series of human poses (in the form of 3D coordinates) with shape (N, 17, 3). I'm training my model on dance choreography (where…

regression lstm recurrent-neural-network seq2seq

asked Nov 30 '19 at 01:55

ROODAY

151
1
6

2

votes

0 answers

BERT for non-textual sequence data

I'm working on a deep learning solution for classifying sequence data that isn't raw text but rather entities (which have already been extracted from the text). I am currently using word2vec-style embeddings to feed the entities to a CNN, but I was…

time-series categorical-data conv-neural-network embeddings seq2seq

asked Nov 14 '19 at 07:59

daanvdn

31
1

2

votes

0 answers

Sequence to sequence with real number features?

This is probably related to this question: How to make a seq2seq model work with infinite vocabulary? I have read many seq2seq implementations and they all seem to only work on fixed, well-discretized vocabulary which the main application of seq2seq…

machine-learning seq2seq

asked May 07 '18 at 18:42

xxbidiao

187
1
1
8

2

votes

1 answer

In seq2seq, how is the attention vector combined with the hidden state of the decoder?

My understanding of attention is that a weighted combination of a set of vectors is somehow combined with the decoder's hidden state. How exactly is it combined? Is it added to the hidden state before it enters the cell at each time step? In…

deep-learning tensorflow lstm recurrent-neural-network seq2seq

asked Feb 01 '18 at 16:29

chris

145
6

1

vote

0 answers

How can one feed all of the input to an RNN, and then get all of the output from it?

When reading papers, a common concept is delaying the output of RNNs to after seeing all of the input. E.g., the neural Turing machine paper uses this technique, together with a simple identity function on the input sequence, to gauge how long-term…

machine-learning neural-networks recurrent-neural-network attention seq2seq

asked Nov 27 '21 at 22:20

HappyFace

121
3

1

vote

0 answers

How to use LSTM for time series with the output as a sequence and not the future values

Currently, I have data of batteries and this data is recorded every second for different dates. So, I have data for 10 dates and the output that I need to predict is Zimg and Zreal, which are two features to plot the Nyquist curve for the batteries…

neural-networks lstm keras recurrent-neural-network seq2seq

asked Oct 10 '21 at 13:06

Smita Singh

11
1

1

vote

0 answers

Levenshtein/Edit Distance as a loss function for sequence transformer models?

Often, the loss function used for a sequence is cross entropy loss between $y_{true}$ and $y_{pred}$ where both are of size $SeqLength \times NumClasses$. When $y_{pred}=y_{true}$ we get the lowest loss, however if we shift the values of $y_{pred}$…

machine-learning loss-functions similarities transformers seq2seq

asked Jul 25 '21 at 22:26

Avelina

809
1
12

1

vote

0 answers

Training a Discriminator to guide Beam Search for a seq2seq model?

The idea is to train a discriminator during training of the seq2seq model to differentiate between 'fake' decoder outputs and 'real' decoder targets, while not propagating discriminator loss to the seq2seq model. Then during inference the…

machine-learning neural-networks gan seq2seq

asked Jul 19 '21 at 21:54

Avelina

809
1
12

Questions tagged [seq2seq]