Are text generation models generative or discriminative?

Question

I've recently been studying generative and discriminative models, and I had a question regarding text generation.

I'm aware that generative models model $P(X, Y)$ and discriminative models model $P(Y | X)$. I'm also aware that generative models are able to generate new samples of $(x_i, y_i)$.

If we had a text generation model (i.e., question generation, text summarization, etc.) then would this considered to be generative or discriminative?

At first glance I would assume they're discriminative because the model is built on $P(Y | X)$, but at the same time I'm not sure because they would also be able to generate new data samples.

some might call it a ["conditional generative"](https://stats.stackexchange.com/q/408421/26948) model — shimao, Feb 15 '21 at 16:14

score 3 · Accepted Answer · answered Feb 23 '21 at 03:17

Summary

There's a crucial difference between text generation and generative models, which I see as the core of your question. They're orthogonal. You can have generative models that generate text, discriminative models that generate text, or the more familiar models from either category that do neither (e.g. logistic regression, Naive Bayes). To avoid confusion, I'll say 'compose text' instead of 'generate text'.

Remember your characterization that a generative model models a joint distribution, while a discriminative model models a conditional distribution.

Typically, text summarization or question generation is done with a discriminative model. In order to compose the summary, you must first provide the text to be summarized! You can't just sample willy-nilly from the model. Nothing says you can't use a generative model (modeling both the original and summarized text), but there are reasons it's unpopular.

An example: machine translation

Machine translation is another example of a task that asks you to compose text. I bring it up because in the past decade, it's largely shifted from generative to discriminative models. (Note that there are exceptions.)

Until recently, statistical translation systems modeled were generative. The goal was to find the optimal sequence, by decoding this core generative model:

$$\hat{y} = {\arg\max}_y\, \underbrace{p(y)}_{\textrm{Language model}} \cdot \underbrace{p(x \mid y)}_{\textrm{Translation model}}$$

Nowadays, direct discriminative models are more common, instead of the noisy channel–based model above.

$$\hat{y} = {\arg\max}_y\, p(y \mid x)$$

Neither type of model would be any good for translation if it couldn't compose text, right? The difference is that the discriminative models demand an input before you create anything. With the generative model above, the procedure for sampling a sentence and its translation is simple: sample from the target language's language model to get a (y), then sample from the translation model to get your (x). No original input needed.

To address your remark:

they would also be able to generate new data samples

Discriminative models can only generate new samples when provided with a given value of (X). They can't generate arbitrary new $(x, y)$ samples.

Wrap-up

Some terminological confusion has come up, from places that use terms like 'conditional generative model'. The confusion arose because of precisely what you're asking about—'generative' has two meanings, and it's easy to miss the distinction.

The rule of thumb here: 'generative' models can be sampled from, without needing any input. Text generation is a separate matter, which can be done by both classes of model.

This is a good answer. This may actually require some more detail, but could you elaborate on what you mean when you say that "there are reasons it's unpopular" w.r.t. using generative models for text "composition?" What might some reasons be? — Sean, Feb 23 '21 at 05:39
Ah, that’s mostly because of the _general_ differences between generative and discriminative models, like computational cost and ease of introducing new features. Would you like me to summarize those in my answer? — Arya McCarthy, Feb 23 '21 at 14:11
If it's not too much trouble that'd be great! I feel like it'd provide some good contextual information for myself and others to think about. — Sean, Feb 24 '21 at 00:25

Are text generation models generative or discriminative?

1 Answers1

Summary

An example: machine translation

Wrap-up