Summary
There's a crucial difference between text generation and generative models, which I see as the core of your question. They're orthogonal. You can have generative models that generate text, discriminative models that generate text, or the more familiar models from either category that do neither (e.g. logistic regression, Naive Bayes). To avoid confusion, I'll say 'compose text' instead of 'generate text'.
Remember your characterization that a generative model models a joint distribution, while a discriminative model models a conditional distribution.
Typically, text summarization or question generation is done with a discriminative model. In order to compose the summary, you must first provide the text to be summarized! You can't just sample willy-nilly from the model. Nothing says you can't use a generative model (modeling both the original and summarized text), but there are reasons it's unpopular.
An example: machine translation
Machine translation is another example of a task that asks you to compose text. I bring it up because in the past decade, it's largely shifted from generative to discriminative models. (Note that there are exceptions.)
Until recently, statistical translation systems modeled were generative.
The goal was to find the optimal sequence, by decoding this core generative model:
$$\hat{y} = {\arg\max}_y\, \underbrace{p(y)}_{\textrm{Language model}} \cdot \underbrace{p(x \mid y)}_{\textrm{Translation model}}$$
Nowadays, direct discriminative models are more common, instead of the noisy channel–based model above.
$$\hat{y} = {\arg\max}_y\, p(y \mid x)$$
Neither type of model would be any good for translation if it couldn't compose text, right? The difference is that the discriminative models demand an input before you create anything. With the generative model above, the procedure for sampling a sentence and its translation is simple: sample from the target language's language model to get a (y), then sample from the translation model to get your (x). No original input needed.
To address your remark:
they would also be able to generate new data samples
Discriminative models can only generate new samples when provided with a given value of (X). They can't generate arbitrary new $(x, y)$ samples.
Wrap-up
Some terminological confusion has come up, from places that use terms like 'conditional generative model'. The confusion arose because of precisely what you're asking about—'generative' has two meanings, and it's easy to miss the distinction.
The rule of thumb here: 'generative' models can be sampled from, without needing any input. Text generation is a separate matter, which can be done by both classes of model.