6

I understand the use of attention mechanisms in the encoder-decoder for sequence-to-sequence problem such as a language translator.

I am just trying to figure out whether it is possible to use attention mechanisms with standard auto-encoders for feature extraction where the goal is to compress the data into a latent vector?

Suppose we had a time series data with N dimensions and we wanted to use an auto-encoder with attention mechanisms (I am thinking of a self-attention because I think it is more appropriate in this case - I might be wrong) to better learn interdependence among the input sequence and thus we would get a better latent vector L.

Or it could be better to use Recurrent Neural Network or its variants in this case.

Does anyone have better thoughts or an intuition behind this?

Amhs_11
  • 173
  • 8
  • 1
    Technically, RNN does "compress the data into a latent vector" as well, just using different approach. – Tim Oct 23 '20 at 13:24
  • Thanks, @Tim. Yes, I am aware of this. What I am asking is that would it be possible (in theory) to use attention mechanisms with an autoencoder for feature extraction to enforce encoder part to learn better representation and may get better more richer latent vector without losing much useful information. I am just trying to see whether there is a potential to add attention in this case or not. I did not find much resources that discuss this in particular. – Amhs_11 Oct 23 '20 at 13:33
  • 4
    This is an interesting question, and I'm curious to know the answer. But since transformers are relatively new, I wonder if an answer is known yet -- it's possible that no one's researched this exact question and published their findings yet. – Sycorax Oct 26 '20 at 01:38
  • @Sycorax, thanks for your comment. Yes, I totally agree with you. Perhaps, some intuition or thought on this would be appreciated. At least, we can see some high level discussions to better understand it. – Amhs_11 Oct 26 '20 at 02:16
  • I don't see the difference between your scenario and seq2seq. Also, you can use both RNN encoders/decoders and attention at the same time. – Firebug Oct 31 '20 at 17:13
  • Actually, there is a difference between the two cases. In seq2seq, the goal is to predict different sequence. Like in language translator, the model would take one language as an input and output different language. In this case, Attention can be used to alleviate the issue of the long-sequence prediction and also could make both encoder/decoder work together to better predict the output. However, the goal of autoencoder (in my scenario) is to train the model to better learn a latent vector from the original data so it can be used as a feature extraction. – Amhs_11 Oct 31 '20 at 22:45
  • @Amhs_11 seq2seq can be used for word/text completion just as well, just like a transformer would – Firebug Nov 05 '20 at 23:26
  • @Firebug, yes, I know that and I do not have any issue with it. I am asking about an autoencoder for learning a latent vector. My question is simple "would attention help in this case"?. – Amhs_11 Nov 06 '20 at 00:14
  • Surely, it's shown to help with long ranged dependencies. I'm betting it, at least, doesn't hinder it – Firebug Nov 06 '20 at 14:02

1 Answers1

0

I think attention can help. Please refer to this answer.

There are many ways for you to incorporate the attention with an autoencoder. The simplest way is just to borrow the idea from BERT but make the middle layers thinner.

Lerner Zhang
  • 5,017
  • 1
  • 31
  • 52
  • Thank you for your answer, I definitely should check BERT. Thank you!. Just one thing, I just went through your answer that you provided in the link. If I understand correctly, you suggest to place the attention on the decoder part?. My question how would this help to better extract features in latent space. – Amhs_11 Nov 22 '20 at 23:25
  • @Amhs_11 No, on the states from the encoder. – Lerner Zhang Nov 23 '20 at 13:00
  • Thanks @Lerner Zhang, okay then I guess it would be added to the encoder part. I still do not get it because you mentioned that "In decoding your attention mechanism just pay attention to that matrix" in the linked answer. Anyway, thank you very much for your clarification. – Amhs_11 Nov 23 '20 at 21:32