From hugginface documentation:
Transformer-based models are unable to process long sequences due to their self-attention operation
How long is "long" here? 1000 inputs? 10,000 inputs?
Just as a rough estimate, how long can the input sequence be before the transformer is impossible to train?