How to test generated text

Question

I am creating a text generation algorithm for my master's research. I have a dialogue between two people and I would like to simulate one part of the conversation with naturally generated text (not templated text). For the training, I'm simply using a softmax to increase the likelihood of getting the correct word. However, for testing, there are as many ways to wash a cat as there are ways to phrase a sentence. For example, the test set's response could be "I have bought the milk you asked for", while the generated text could be "I purchased the milk you requested" which would be perfectly acceptable.

I don't have any labels other than who the author is, so I can't exactly write a text classifier and see if it classifies as the same thing as the test set.

As I'm writing this I thought about using an RNN to maybe reduce the text to a lower-dimensional space and then seeing if I can reconstruct the original text, similar to a variational autoencoder, then I can compare the cosine similarity of the lower-dimensional space of the generated text and test set. Do you think such a solution could work?

PS. 'wash a cat' is a joke.

Lerner Zhang · Answer 1 · 2020-02-29T09:10:25.730

In this study See et al created a three level measure as follows:

I thought you can use the low-level metrics(namely repetition by n-gram overlap, specificity by normalized inverse document frequency, the response-relatedness you mentioned and question asking by counting the ? symbol) to automatically measure the quality of the generated sentences.

There are some other methods: the portion of the same generic answer, unsafe responses containing some keyword or patterns(for instance coarse language and etc), adversarial discriminator.

You can also train a generic language model and use that to see the perplexity, and in this paper, the author states that perplexity correlates very well with human judgment.

Reference: CS224N lecture 15: Natural language generation

How to test generated text

1 Answers1