1

Here's the code in question:

class BertOnlyNSPHead(nn.Module):
    def __init__(self, config):
        super().__init__()
        self.seq_relationship = nn.Linear(config.hidden_size, 2)

    def forward(self, pooled_output):
        seq_relationship_score = self.seq_relationship(pooled_output)
        return seq_relationship_score

I think it was just ranking how likely one sentence would follow another? Wouldn't it be one score?

Lerner Zhang
  • 5,017
  • 1
  • 31
  • 52
SantoshGupta7
  • 629
  • 1
  • 6
  • 12

1 Answers1

2

We can see from these two lines that it's cross entropy loss and it's a score.

loss_fct = CrossEntropyLoss()
loss = loss_fct(logits.view(-1, self.num_labels), labels.view(-1))

The NSP is a binarized next sentence prediction task, and you may need to refer to this question: Is binary logistic regression a special case of multinomial logistic regression when the outcome has 2 levels?.

When the size of the label for sequence prediction is 1, it is doing regression using mean square error.

Lerner Zhang
  • 5,017
  • 1
  • 31
  • 52