I guess technically, you can interpret k-means as a generative model through the connection to mixture models, though this is a stretch that you can only make by seriously constraining the expressivity of your generative model.
A generative model is simply a model for the joint distribution over covariates and labels $(X, Y)$, then use Bayes' rule to calculate $P(Y\mid X)$ to perform inference:
$$\hat{Y} = \underset{Y}\arg\max\; P(Y \mid X)$$
$$\hat{Y} = \underset{Y}\arg\max\; P(X \mid Y)P(Y) / P(X)$$
$$\hat{Y} = \underset{Y}\arg\max\; P(X \mid Y)P(Y) \quad \text{(X is constant w.r.t. Y)}$$
$$\hat{Y} = \underset{Y}\arg\max\; P(X, Y) \quad \text{(by definition of conditional probability)}$$
which is the model the generative model fits in the first place.
So how is k-means a generative model? One way to interpret this is that k-means models the joint distribution over (X, Y) by optimizing over a certain (as in, w.p. 1) assignment of points $X$ to clusters corresponding to $Y$. Since k-means assigns each point to one cluster only, say, point $x^{(i)}$ to cluster $C(x^{(i)})$, we have that $P(x^{(i)}, C(x^{(i)})) = 1$, and $P(x^{(i)}, C(\cdot)) = 0$ for any other cluster. Using the method for inference in a generative model described above, k-means results in predictions that simply "look up" which cluster a point belongs to, which is what you do for k-means.
The more rigorous interpretation of this is through the connection between k-means and mixture models, which is often addressed via the EM-algorithm for a Gaussian mixture model with spherical clusters as described here. In the GMM case, it is also possible (and arguably more intuitive) to interpret the model as performing a "soft"/probabilistic assignment of points $X$ to clusters for $Y$.
Also, I don't understand the statement about a discriminative model not having probabilities to predict the number of classes -- logistic regression-based classifiers are discriminative models that output a "probability score," and by definition, discriminative models predict $P(Y \mid X)$.
Edit:" I forgot to answer your question about generating new data. Through the connection from k-means to a spherical Gaussian mixture model, you can select a cluster (i.e. pick $Y$), choose the Gaussian corresponding to the cluster, and generate data by drawing from that distribution (I think -- I've never done this myself). The centroid corresponds to the mean of the cluster.
The derivation is provided here. However, note that $\sigma$, i.e. the variance of underlying Gaussians in the mixture, is non-identifiable, as changing the variance does not impact the objective function.