The log-likelihood is dependent on the probability model(s) you consider for the observed data as well as the data themselves.
If the likelihood of the sample is greater under one model than another, we tend to infer that the former model is more likely than the later. Whilst not a probability per se (in fact, it is a probability density) the likelihood can rank two probability models in such a fashion, even for a single observation. The log-likelihood is simply the log of the likelihood. If a likelihood is less than 1, the log-likelihood is negative, but this can arise from noisy data, sparse data, small sample sizes, among a host of other causes. We cannot objectively say anything based on a single likelihood or log-likelihood, it is strictly relative. It only compares models.
One frequently used model for clustering is a Gaussian density, which you describe. It gives probability laws relating how far an observation will fall from its "centroid" or mean. The optimal log-likelihood model is a saturated model where out of $n$ observations, there are $n$ clusters having the observed value as its centroid, and the standard deviation(s) is/are irrelevant.
Log-likelihoods are used frequently in statistical inference; but only to infer whether one probability model is better than another for observed data. This is a confirmatory, and not exploratory comparison. They do not determine the total number of clusters because you are not comparing models, which is an exploratory question. The tendency of likelihood in that case is to overfit because maximum likelihood has some high dimensional problems.
If you have the log-likelihood, however, you can convert that value to a Bayesian Information Criterion. This enforces sparse models by penalizing the total number of parameters in the model.