1

The idea is to train a discriminator during training of the seq2seq model to differentiate between 'fake' decoder outputs and 'real' decoder targets, while not propagating discriminator loss to the seq2seq model. Then during inference the discriminator could be used either as a scoring function in beam search, or beam search could proceed as normal and the discriminator would be used to rank the output beams to select the most 'real' looking sequence.

I've seen some papers and architectures where a discriminator is used as part of the training loop as for a seq2seq GAN, but I have not seen a discriminator used as a learnable score function for inference-time beam search. Has any work been done in this area? And if not is there a reason as to why this isn't a good idea?

Avelina
  • 809
  • 1
  • 12

0 Answers0