I just got into models beyond biLSTM, would like to start with applying attention to my existing network (RNN). I find examples for attention always with encoder decoder architecture, however is it possible to use attention without encoder decoder? Is there any example, blog, any kind of content out there discussing an encoder/decoder-free attention model? Or are encoder/decoder simply 100% necessary for the attention mechanism to be applied?
I specifically work with text data on sequence modelling.