C-Index for cluster analysis in Stata

Question

I'm wondering how to calculate the C-Index for determining a 'good' number of groups in a cluster analysis in Stata? It's mentioned in this post (What is an acceptable value of the Calinski & Harabasz (CH) criterion?) for R, but it seems Stata does not provide a built-in solution.

Thank you!

What makes it difficult for you to calculate it as soon as you understood it? Or you haven't undersood it yet? — ttnphns, Nov 25 '13 at 16:33
P.s. My macro for SPSS computes it. But you said you want code for Stata... — ttnphns, Nov 25 '13 at 16:35
Where can I find your macro? It seems it can be implemented in Stata using the cluster programming subroutines, but I have no experience in programming those kind of things. — SPi, Nov 25 '13 at 16:45
OK then, if you have SPSS, try it. Visit my web-page and download "Clustering criterions". The documentation is only partly in english, so, if you get questions ask me by email. Please note: 1) C-Index takes time to compute (I don't recommend the macro if you have, say, 500+ objects) but point-biserial _r_ is fast and often give similar results; 2) C-Index is just one of many clustering indices, and you might want to choose another (e.g. Silhouette is quite popular nowadays). — ttnphns, Nov 25 '13 at 16:58
My instinct is that this would require delving much deeper into Stata's code than is easy or even possible. A more fundamental concern is that the criterion is of dubious relevance unless it was used to define clusters in the first place or can be related directly to cluster generation. — Nick Cox, Nov 25 '13 at 19:32
Alright, so Calinski/Harabasz pseudo-F and Duda/Hart Je(2)/Je(1) index are the two only stopping criteria available in Stata by default, I guess. I was hoping there are more for agglomerative cluster algorithms. — SPi, Nov 25 '13 at 21:27
The computation itself is rather simple. If you were an experienced (programming) user it would be simple in Stata (I suppose so). — ttnphns, Nov 25 '13 at 22:10
The Stata `cluster` command is specially prepared for user-written stopping rules. Make sure you read [this section](https://tinyurl.com/n3gcu4s) of the manual, page 3. There's an example that might be useful. It requires some programming but it doesn't look extremely complicated. You may want to give it a try. Also, your question seems off-topic here since you seek only a Stata command to compute some index. — Roberto Ferrer, Nov 26 '13 at 03:13
Computation is explained here https://stats.stackexchange.com/q/343878/3277 — ttnphns, May 08 '18 at 10:41

C-Index for cluster analysis in Stata

0 Answers0