How can i know for which clustering algorithms (with a parameter that represents number of clusters) it makes sense to use the Gap statistic? I've read in the paper by Tibshirani, Walter & Hastie that
It is designed to be applicable to virtually any clustering method.
But then the authors proceed in the theoretical part to
For simplicity [...] focus on the widely used K-means clustering procedure.
My question is, what are the procedures for which it can really be applied? What changes do i need to make (if any) when applying the Gap statistics to other procedures? Should i choose different measures of distance (as opposed to defaulting to the euclidian used for K-means) for different procedures?
To provide a specific list of algorithms i am curious about:
- k-modes & k-prototypes - Does it make sense to use Gap statistic with a different distance measure? Specifically, using a distances related to the cost functions used by these two algorithms?
- Ward hierarchical clustering
- Spectral clustering - is there any way to make gap statistic useful for selection of clusters in spectral clustering? I am not really sure if i should just swap euclidian distance for some other measure (if so, which?), keep using euclidian distance, or there simply is not a way to make gap statistic meaningful.
I am sure that after reading my question the first thought will be that it really depends on what i mean by the words "useful", "meaningful", "right" and "work", but putting this aside, i am looking for systematic ways how to choose number of clusters. I would like these ways of finding number of clusters not to be irrational and would like to avoid a scenario where i do something that is widely considered a bad approach.