Is k-means a generative model and how could it be used to generate new data then?

Question

In today's lecture we learnt that k-means would be generative model.

I am really puzzled on this because in my intuition it would be more a discriminative model since there is no probability to predict the classes given a number of values and features. I tried to research on the web but I found very little and conflicting statements on this.

So why should k-means be a generative model and how could it be used to generate new data then?

“ discriminative model since there is no probability involved”—can you please explain this remark? — Arya McCarthy, Nov 18 '21 at 19:25

tchainzzz · Accepted Answer · 2021-11-19T03:17:25.477

I guess technically, you can interpret k-means as a generative model through the connection to mixture models, though this is a stretch that you can only make by seriously constraining the expressivity of your generative model.

A generative model is simply a model for the joint distribution over covariates and labels $(X, Y)$, then use Bayes' rule to calculate $P(Y\mid X)$ to perform inference:

$$\hat{Y} = \underset{Y}\arg\max\; P(Y \mid X)$$ $$\hat{Y} = \underset{Y}\arg\max\; P(X \mid Y)P(Y) / P(X)$$ $$\hat{Y} = \underset{Y}\arg\max\; P(X \mid Y)P(Y) \quad \text{(X is constant w.r.t. Y)}$$ $$\hat{Y} = \underset{Y}\arg\max\; P(X, Y) \quad \text{(by definition of conditional probability)}$$ which is the model the generative model fits in the first place.

So how is k-means a generative model? One way to interpret this is that k-means models the joint distribution over (X, Y) by optimizing over a certain (as in, w.p. 1) assignment of points $X$ to clusters corresponding to $Y$. Since k-means assigns each point to one cluster only, say, point $x^{(i)}$ to cluster $C(x^{(i)})$, we have that $P(x^{(i)}, C(x^{(i)})) = 1$, and $P(x^{(i)}, C(\cdot)) = 0$ for any other cluster. Using the method for inference in a generative model described above, k-means results in predictions that simply "look up" which cluster a point belongs to, which is what you do for k-means.

The more rigorous interpretation of this is through the connection between k-means and mixture models, which is often addressed via the EM-algorithm for a Gaussian mixture model with spherical clusters as described here. In the GMM case, it is also possible (and arguably more intuitive) to interpret the model as performing a "soft"/probabilistic assignment of points $X$ to clusters for $Y$.

Also, I don't understand the statement about a discriminative model not having probabilities to predict the number of classes -- logistic regression-based classifiers are discriminative models that output a "probability score," and by definition, discriminative models predict $P(Y \mid X)$.

Edit:" I forgot to answer your question about generating new data. Through the connection from k-means to a spherical Gaussian mixture model, you can select a cluster (i.e. pick $Y$), choose the Gaussian corresponding to the cluster, and generate data by drawing from that distribution (I think -- I've never done this myself). The centroid corresponds to the mean of the cluster.

The derivation is provided here. However, note that $\sigma$, i.e. the variance of underlying Gaussians in the mixture, is non-identifiable, as changing the variance does not impact the objective function.

Is k-means a generative model and how could it be used to generate new data then?

1 Answers1