Do we need Masking in test phase while we use transformer?

Question

We use two types of masks when we train transformer models one is in the architecture of encoder to adjust for the length of the input sequence and another is the mask that is being used by the decoder to prevent left ward flow of information(cheating).

I am confused as to whether we need to use the masking during test phase also or not (because we still need to do the dot product between Q and K in the test phase)? If so then is it same as that we use during train or different(my hunch is that is should be somewhat different in decoder phase). If not then how will we adjust the variable length the sentences.

I have tried gone through below mentioned articles but none of them tells about the masked thing.

score 1 · Answer 1 · answered Aug 10 '21 at 07:56

You definitely need the masking for the padded positions in the encoder. Nothing changes there. If you implement the decoder correctly/efficiently, you should not need the triangle mask in the decoder either.

In an efficient implementation, you only have one query that comes from the most recently added token, and you do the self-attention with the hidden states from the previous step as keys and values. (And append the hidden states from the current step to be a part of the keys and values in the future.) No masking is necessary because you only attend to previous states anyway.

In a sort of lazy implementation that people often do in tutorials, they often just call the decoder for the entire prefix of what is currently decoded. All decoder hidden states are computed over and over again. It is indeed not efficient, but the same code as for training can be used. In that case, the triangle mask is necessary. At inference time, it is no longer to prevent the model from cheating, it would just break because it would encounter something it never did at the training time.

Thanks for the answer just for the clarification: The masking is only required during training but it is not required during inference and we need to run the model sequentially during inference to get the results? Correct? — Varun Singh, Aug 10 '21 at 14:13

Do we need Masking in test phase while we use transformer?

1 Answers1