0

I'm wondering how to calculate the C-Index for determining a 'good' number of groups in a cluster analysis in Stata? It's mentioned in this post (What is an acceptable value of the Calinski & Harabasz (CH) criterion?) for R, but it seems Stata does not provide a built-in solution.

Thank you!

SPi
  • 553
  • 1
  • 6
  • 18
  • What makes it difficult for you to calculate it as soon as you understood it? Or you haven't undersood it yet? – ttnphns Nov 25 '13 at 16:33
  • P.s. My macro for SPSS computes it. But you said you want code for Stata... – ttnphns Nov 25 '13 at 16:35
  • Where can I find your macro? It seems it can be implemented in Stata using the cluster programming subroutines, but I have no experience in programming those kind of things. – SPi Nov 25 '13 at 16:45
  • OK then, if you have SPSS, try it. Visit my web-page and download "Clustering criterions". The documentation is only partly in english, so, if you get questions ask me by email. Please note: 1) C-Index takes time to compute (I don't recommend the macro if you have, say, 500+ objects) but point-biserial _r_ is fast and often give similar results; 2) C-Index is just one of many clustering indices, and you might want to choose another (e.g. Silhouette is quite popular nowadays). – ttnphns Nov 25 '13 at 16:58
  • My instinct is that this would require delving much deeper into Stata's code than is easy or even possible. A more fundamental concern is that the criterion is of dubious relevance unless it was used to define clusters in the first place or can be related directly to cluster generation. – Nick Cox Nov 25 '13 at 19:32
  • Alright, so Calinski/Harabasz pseudo-F and Duda/Hart Je(2)/Je(1) index are the two only stopping criteria available in Stata by default, I guess. I was hoping there are more for agglomerative cluster algorithms. – SPi Nov 25 '13 at 21:27
  • The computation itself is rather simple. If you were an experienced (programming) user it would be simple in Stata (I suppose so). – ttnphns Nov 25 '13 at 22:10
  • The Stata `cluster` command is specially prepared for user-written stopping rules. Make sure you read [this section](https://tinyurl.com/n3gcu4s) of the manual, page 3. There's an example that might be useful. It requires some programming but it doesn't look extremely complicated. You may want to give it a try. Also, your question seems off-topic here since you seek only a Stata command to compute some index. – Roberto Ferrer Nov 26 '13 at 03:13
  • Computation is explained here https://stats.stackexchange.com/q/343878/3277 – ttnphns May 08 '18 at 10:41

0 Answers0