Highest Voted 'machine-translation' Questions - Statistical Analysis Stack Exchange

158

votes

9 answers

What exactly are keys, queries, and values in attention mechanisms?

How should one understand the keys, queries, and values that are often mentioned in attention mechanisms? I've tried searching online, but all the resources I find only speak of them as if the reader already knows what they are. Judging by the paper…

asked Aug 13 '19 at 09:00

Sean

2,184
2
9
22

28

votes

4 answers

What are "residual connections" in RNNs?

In Google's paper Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation, it is stated Our LSTM RNNs have $8$ layers, with residual connections between layers ... What are residual connections? Why…

neural-networks lstm recurrent-neural-network residual-networks machine-translation

asked Jan 01 '18 at 03:03

user82135

8

votes

1 answer

What is the intuition behind the positional cosine encoding in the transformer network?

I don't understand how adding the cosine encodings/functions to each of the dimension of the word vector embedding enables the network to "understand" where each word is situated in the sentence. What is the intuition behind it? It seems a bit…

natural-language word-embeddings embeddings machine-translation

asked May 22 '19 at 16:54

Tom

1,204
8
17

5

votes

1 answer

Why are Transformers "suboptimal" for language modeling but not for translation?

Language Models with Transformers states: Transformer architectures are suboptimal for language model itself. Neither self-attention nor the positional encoding in the Transformer is able to efficiently incorporate the word-level sequential context…

neural-networks natural-language language-models transformers machine-translation

asked Apr 29 '21 at 21:07

MWB

1,143
9
18

3

votes

1 answer

Multi-Head attention mechanism in transformer and need of feed forward neural network

After reading the paper, "Attention is all you need," I have two questions. 1) What is the need of multi-head attention mechanism? Paper says that "Multi-head attention allows the model to jointly attend to information from different representation…

machine-learning neural-networks machine-translation

asked Jul 14 '19 at 13:40

Zephyr

163
3

3

votes

0 answers

From a deep learning point of view, is there a lower limit on the number of hours of speech needed to train a neural net

From a deep learning practitioner's point of view, is there a lower limit on the number of hours of speech needed to train a neural net to translate speech to text? An estimate from CMU is 3000-5000 hours for 90% accuracy commercial quality speech…

neural-networks compression machine-translation

asked Jul 08 '19 at 14:22

Lars Ericson

361
3
8

3

votes

0 answers

Hessian-Free instead of LSTM for Recurrent Net Machine Translation

Last year, Ilya Sutskever and collaborators came out with a paper about a recurrent LSTM net that learns sequence to sequence mappings for machine translation. It's somewhat surprising that the authors used LSTM instead of Hessian-Free to train this…

machine-learning neural-networks natural-language lstm machine-translation

asked Feb 11 '15 at 22:32

sudo-nim

143
4

2

votes

1 answer

In cases where neural attention is used for machine translation, how they deal with translating sentences that have different lengths?

So attention and transformer models can be used for machine translation. Sometimes, a sentence in one language might consist of 5 words, but in the target language it consists of 8 words (so for example a word in the source language might be two…

machine-learning recurrent-events attention transformers machine-translation

asked Aug 12 '21 at 21:09

Kadaj13

355
2
8

2

votes

1 answer

Teacher Forcing in RNNs

I'm reading about teacher forcing for neural translation applications here and here , but I am a little confused on the method. Why does teacher forcing speed up training? Also why in the Kaggle link are they only doing teacher forcing a percentage…

neural-networks machine-translation

asked Jan 08 '21 at 20:10

Eisen

181
3

2

votes

1 answer

What feature space is used in transformer networks for machine translation?

Title is the question. The papers I've read, e.g. "Attention is All You Need" fail to specify exactly what word embeddings are used in these machine translation networks. In most cases they'll mention the dimensionality,e.g. 512, but don't specify…

neural-networks machine-translation transformers

asked Aug 13 '20 at 22:45

Dave

3,109
15
23

2

votes

1 answer

Machine Translation: with sufficient parallel data, can we improve even further the performance of the system with the use of monolingual data?

I am trying to find scientific literature that studies if, in a situation in which we already have enough parallel data, the addition of monolingual data can further improve performance. I have not been able to find anything yet, but it seems…

language-models machine-translation corpus-linguistics

asked Jan 13 '20 at 18:04

Hill Farmer

23
3

2

votes

1 answer

How do RNNs used in Machine Translation have the right output length?

For machine translation the length of input and output sequences is mostly different. Typically considering an encoder-decoder architecture is used, how does the output come out to be the right length for different sentences, considering that we…

neural-networks natural-language recurrent-neural-network machine-translation

asked Dec 21 '19 at 06:56

Hrishikesh Athalye

23
2

2

votes

1 answer

Is Length Normalization used in each step of Beam Search?

In Andrew Ng's lesson on refining Beam Search, it seems that Length Normalization is used ONLY AFTER LAST STEP of Beam Search, that is, when the B most probable sequences have been generated. My question is, would it be better to use Length…

natural-language recurrent-neural-network sequence-analysis machine-translation

asked Jul 31 '18 at 15:39

cjbayron

343
2
9

2

votes

1 answer

In phrase-based machine translators, how does the program recognize phrases in the corpus text?

I know that a phrase-based statistical machine translator finds the probability of a correct translation by analyzing a bilingual corpus text, and it maps phrases from the one language to phrases in the other language. By the frequency of maps…

machine-translation

asked Dec 30 '16 at 18:02

user3500869

133
3

2

votes

0 answers

Was BaiduTrans the first scalable deployment of Neural Machine Translation?

I read these two tweets written on 2016-12-15 by Andrew Ng: Strong desire for global content made China 1st to develop Neural Machine Translation. US lucky to have so much english content […] @ruchitgarg @ylecun Yes, that's what I meant. AFAIK…

neural-networks natural-language history machine-translation

asked Dec 16 '16 at 17:57

Franck Dernoncourt

42,093
30
155
271

Questions tagged [machine-translation]