Say that we got a set $\mathcal{X} = \{x_1, x_2, \ldots\}$ of samples. You want to partition $\mathcal{X}$ into $k$ subsets, where $k$ is unknown besides the fact that $k \ge 1$. How clustering problem can be forumlated in probabilistic terms (as opposed to heuristics using distance measures)?
Here is an example: let $x$ be a sample, then for any cluster $i \in \{1, 2, \ldots, k\}$ find the probability $\Pr(C=i | X=x)$, where $C$ and $X$ are random variables that take values in sets $\{1, 2, \ldots, k\}$ and $\mathcal{X}$, respectively. How can you move forward with this approach? We don't even know $k$...