Questions tagged [machine-translation]

Machine translation (MT) is a sub-field of computational linguistics that investigates the use of software to translate text or speech from one language to another.

43 questions
158
votes
9 answers

What exactly are keys, queries, and values in attention mechanisms?

How should one understand the keys, queries, and values that are often mentioned in attention mechanisms? I've tried searching online, but all the resources I find only speak of them as if the reader already knows what they are. Judging by the paper…
Sean
  • 2,184
  • 2
  • 9
  • 22
28
votes
4 answers

What are "residual connections" in RNNs?

In Google's paper Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation, it is stated Our LSTM RNNs have $8$ layers, with residual connections between layers ... What are residual connections? Why…
8
votes
1 answer

What is the intuition behind the positional cosine encoding in the transformer network?

I don't understand how adding the cosine encodings/functions to each of the dimension of the word vector embedding enables the network to "understand" where each word is situated in the sentence. What is the intuition behind it? It seems a bit…
5
votes
1 answer

Why are Transformers "suboptimal" for language modeling but not for translation?

Language Models with Transformers states: Transformer architectures are suboptimal for language model itself. Neither self-attention nor the positional encoding in the Transformer is able to efficiently incorporate the word-level sequential context…
3
votes
1 answer

Multi-Head attention mechanism in transformer and need of feed forward neural network

After reading the paper, "Attention is all you need," I have two questions. 1) What is the need of multi-head attention mechanism? Paper says that "Multi-head attention allows the model to jointly attend to information from different representation…
3
votes
0 answers

From a deep learning point of view, is there a lower limit on the number of hours of speech needed to train a neural net

From a deep learning practitioner's point of view, is there a lower limit on the number of hours of speech needed to train a neural net to translate speech to text? An estimate from CMU is 3000-5000 hours for 90% accuracy commercial quality speech…
3
votes
0 answers

Hessian-Free instead of LSTM for Recurrent Net Machine Translation

Last year, Ilya Sutskever and collaborators came out with a paper about a recurrent LSTM net that learns sequence to sequence mappings for machine translation. It's somewhat surprising that the authors used LSTM instead of Hessian-Free to train this…
2
votes
1 answer

In cases where neural attention is used for machine translation, how they deal with translating sentences that have different lengths?

So attention and transformer models can be used for machine translation. Sometimes, a sentence in one language might consist of 5 words, but in the target language it consists of 8 words (so for example a word in the source language might be two…
2
votes
1 answer

Teacher Forcing in RNNs

I'm reading about teacher forcing for neural translation applications here and here , but I am a little confused on the method. Why does teacher forcing speed up training? Also why in the Kaggle link are they only doing teacher forcing a percentage…
Eisen
  • 181
  • 3
2
votes
1 answer

What feature space is used in transformer networks for machine translation?

Title is the question. The papers I've read, e.g. "Attention is All You Need" fail to specify exactly what word embeddings are used in these machine translation networks. In most cases they'll mention the dimensionality,e.g. 512, but don't specify…
Dave
  • 3,109
  • 15
  • 23
2
votes
1 answer

Machine Translation: with sufficient parallel data, can we improve even further the performance of the system with the use of monolingual data?

I am trying to find scientific literature that studies if, in a situation in which we already have enough parallel data, the addition of monolingual data can further improve performance. I have not been able to find anything yet, but it seems…
2
votes
1 answer

How do RNNs used in Machine Translation have the right output length?

For machine translation the length of input and output sequences is mostly different. Typically considering an encoder-decoder architecture is used, how does the output come out to be the right length for different sentences, considering that we…
2
votes
1 answer

Is Length Normalization used in each step of Beam Search?

In Andrew Ng's lesson on refining Beam Search, it seems that Length Normalization is used ONLY AFTER LAST STEP of Beam Search, that is, when the B most probable sequences have been generated. My question is, would it be better to use Length…
2
votes
1 answer

In phrase-based machine translators, how does the program recognize phrases in the corpus text?

I know that a phrase-based statistical machine translator finds the probability of a correct translation by analyzing a bilingual corpus text, and it maps phrases from the one language to phrases in the other language. By the frequency of maps…
user3500869
  • 133
  • 3
2
votes
0 answers

Was BaiduTrans the first scalable deployment of Neural Machine Translation?

I read these two tweets written on 2016-12-15 by Andrew Ng: Strong desire for global content made China 1st to develop Neural Machine Translation. US lucky to have so much english content […] @ruchitgarg @ylecun Yes, that's what I meant. AFAIK…
1
2 3