1

I have to cluster some data using non-parametric clustering technique which is given in this paper. After all the cluster evaluation measure used in this paper is Normalized Mutual Information as they people know about groupings before hand.

In my case, the data which I have used is not labelled empirically, though I've to cluster it using same technique which I've accomplished almost. So far what I know is, we use to use internal evaluation measures (such as Davies-Bouldin Index, Dunn Index, CD Index or Silhouette Index etc.) when there is no ground truth and external evaluation measures (such as Purity, Precision, Recall, F-Measure or NMI etc.) when there exist some ground truth to match. But as here in my case there is no proper ground truth to match.

Which specific evaluation measure (say internal measure) will I choose from internal measures set (if I'm right to choose from this set) to evaluate clustering results?

I would like to learn how to choose clustering evaluation measure (i.e. internal or external) according to context.

maliks
  • 191
  • 1
  • 8
  • When I was programming some popular internal clustering criterions I described, superficially their properties - as I perceived it - in a tech document (find it on my web-page, "Clustering criterions", most important parts are in english). – ttnphns Jun 25 '16 at 21:41
  • Your choice will depend on the nature of the data (continuous - hence distance-based clusters, or categorical, hence more count-based clusters); on the shape of clusters (are they gaussian-like or, say, worm-like); etc. A [thread](http://stats.stackexchange.com/q/195456/3277) with a number of further links in comments and answers, about cluster validation. – ttnphns Jun 25 '16 at 21:46
  • @ttnphns and my nature of the data is not continuous and shape of clusters most probably be Gaussian-like as I used Dirichlet priors in clustering, so what measure you suggest? – maliks Jun 25 '16 at 23:03
  • @ttnphns you haven't provided the URL for you web page "Clustering Criterions" – maliks Jun 25 '16 at 23:12
  • @ttnphns would you like to share the url link? – maliks Jun 26 '16 at 09:55
  • The url is on my profile page – ttnphns Jun 26 '16 at 10:42
  • @ttnphns that word file contain one line descriptions for indices whereas I thought of codes for indices – maliks Jun 26 '16 at 11:16
  • `codes for indices` What do you mean? If you mean a computer program - then the SPSS macros' code is in a .sps-extension text file. Are you SPSS user to be interested? – ttnphns Jun 26 '16 at 12:15
  • In your question you asked: `I would like to learn how to choose clustering evaluation measure`. In my first comment, I responded that some brief remarks on characteristics of some of internal clustering criterions could be found in my document. That might help, to a degree, in selecting this or that criterion, I thought. I also left a link to a thread where I explained what is internal and external cluster validation. Were my comments OK? – ttnphns Jun 26 '16 at 12:20
  • Please note that I did not read the paper you link to in your question. Therefore I cannot recommend you a specific criterion for your case. My intention was to provide a few initial guidelines. – ttnphns Jun 26 '16 at 12:35
  • @ttnphns yeah I'm interested in choosing suitable criterion first and then to implement it, if I found it ready made, then it'll be obvious to use for my clustering evaluation. Why not I'll? – maliks Jun 26 '16 at 19:21

0 Answers0