I'm running a quick simulation to compare different clustering methods, and currently hit a snag trying to evaluate the cluster solutions.
I know of various validation metrics (many found in cluster.stats() in R), but I assume those are best used if the estimated number of clusters actually equals the true number of clusters. I want to maintain the ability to measure how well a clustering solution performs when it doesn't specify the correct number of clusters in the original simulation (i.e., how well does a three cluster solution model data that were simulated to have a 4-cluster solution). Just for your information, clusters are simulated to possess identical covariance matrices.
I thought KL divergence between two mixtures of Gaussians would be useful to implement, but no closed form solution exists (Hershey and Olson (2007)) and implementing a Monte Carlo simulation is starting to be computationally expensive.
Are there any other solutions that might be easy to implement (even if just an approximation)?