As per answers to this question, there are shortcomings in the heuristics of deciding on the number of clusters.
A more robust approach could be probability based clustering: from a probabilistic perspective, the goal of clustering is to find the most likely set of clusters given the data. Thus, we can never be "100% sure" that training instances should be placed into some cluster: they just have a certain probability of belonging to it.
I wonder how if this reasoning is correct and how it would work in practice.