Why PyTorch MultiheadAttention is considered as activation function?

Asked Jun 25 '21 at 11:10

Active Jun 25 '21 at 12:08

Viewed 89 times

When I scroll all activation functions available on PyTorch package (here), I found that nn.MultiheadAttention is described there. Can you please explain why it's considered activation function? Maybe I understand something wrong, but Multihead Attention have it's own learnable weights, so it seems to be more suitable for Layers, and not activation functions. Can you please explain me what I'm getting wrong.

Thank you!

asked Jun 25 '21 at 11:10

demo

Why PyTorch MultiheadAttention is considered as activation function?

0 Answers0