I understand the use of attention mechanisms in the encoder-decoder for sequence-to-sequence problem such as a language translator.
I am just trying to figure out whether it is possible to use attention mechanisms with standard auto-encoders for feature extraction where the goal is to compress the data into a latent vector?
Suppose we had a time series data with N dimensions and we wanted to use an auto-encoder with attention mechanisms (I am thinking of a self-attention because I think it is more appropriate in this case - I might be wrong) to better learn interdependence among the input sequence and thus we would get a better latent vector L.
Or it could be better to use Recurrent Neural Network or its variants in this case.
Does anyone have better thoughts or an intuition behind this?