I am creating a text generation algorithm for my master's research. I have a dialogue between two people and I would like to simulate one part of the conversation with naturally generated text (not templated text). For the training, I'm simply using a softmax to increase the likelihood of getting the correct word. However, for testing, there are as many ways to wash a cat as there are ways to phrase a sentence. For example, the test set's response could be "I have bought the milk you asked for", while the generated text could be "I purchased the milk you requested" which would be perfectly acceptable.
I don't have any labels other than who the author is, so I can't exactly write a text classifier and see if it classifies as the same thing as the test set.
As I'm writing this I thought about using an RNN to maybe reduce the text to a lower-dimensional space and then seeing if I can reconstruct the original text, similar to a variational autoencoder, then I can compare the cosine similarity of the lower-dimensional space of the generated text and test set. Do you think such a solution could work?
PS. 'wash a cat' is a joke.