What is the length limit of Transformers?

Question

From hugginface documentation:

Transformer-based models are unable to process long sequences due to their self-attention operation

How long is "long" here? 1000 inputs? 10,000 inputs?

Just as a rough estimate, how long can the input sequence be before the transformer is impossible to train?

Depends how much time you have. Computing self-attention requires memory quadratic in the sequence length. — Arya McCarthy, Apr 17 '21 at 13:25
My point is that there’s a huge chasm between “impossible” and “impractical”. You’ll find libraries that cap the length at about 512. — Arya McCarthy, Apr 17 '21 at 13:36
ok maybe impractical was the correct word to use. So it should be between 0-200 length for optimality then? If there is a literal cap at 512 then the authors are suggesting that this is the max reasonable upper limit. — Dylan Kerler, Apr 17 '21 at 14:01
Remember that the practical limit depends on current hardware, the size of the data, and how long you’re willing to wait. But you can also set the limit in, e.g., huggingface, to whatever you want. — Arya McCarthy, Apr 17 '21 at 14:12

score 1 · Accepted Answer · answered Apr 20 '21 at 05:47

There is no theoretical limit on the input length (ie number of tokens for a sentence in NLP) for transformers. However in practice, longer inputs will consume more memory.

A slightly related question with more detailed answers: Why do attention models need to choose a maximum sentence length?

What is the length limit of Transformers?

1 Answers1