Can attention be implemented without encoder / decoder?

Question

I just got into models beyond biLSTM, would like to start with applying attention to my existing network (RNN). I find examples for attention always with encoder decoder architecture, however is it possible to use attention without encoder decoder? Is there any example, blog, any kind of content out there discussing an encoder/decoder-free attention model? Or are encoder/decoder simply 100% necessary for the attention mechanism to be applied?

I specifically work with text data on sequence modelling.

You might want to look at 'Attention Is All You Need' paper and Transformer Networks. — Jakub Bartczuk, Nov 15 '18 at 09:44
I did, however I understood the paper as the implementation was not with an RNN but some kind of different network entirely, which is why they were able to implement attention without encoder-decoder. — xyz, Nov 15 '18 at 09:54
I just want to add, regarding the Attention Is All You Need paper - here is an annotated version of the paper - and it indeed does make use of an encoder and decoder. http://nlp.seas.harvard.edu/2018/04/03/attention.html — xyz, Nov 28 '18 at 20:51

score 1 · Answer 1 · answered Apr 04 '19 at 03:50

Whether a model has an encoder and decoder is orthogonal to whether it uses attention. Actually I doubt the term encoder-decoder has any well defined meaning, as you can take any model, cut it in half any way you like, and then call the first half the encoder and the second half the decoder.

Can attention be implemented without encoder / decoder?

1 Answers1