0

How could I compare two classification results from two implementations of e.g. using different methods/features for pairwise similarity calculation? If one method classified e.g. 3000 data points into 180 groups and another into 150 groups:

  • Can I consider one method results as standard of truth?
  • Can I build a ROC, AUC, or something similar?
  • How could true negative/false negative, etc., be defined?
amoeba
  • 93,463
  • 28
  • 275
  • 317
  • 1
    Welcome to CV. All exploratory approaches to classification suffer from an absence of pre-existing groupings. This means that there is no "ground truth" against which to validate the results. Given that, statistical methods should be applied which evaluate various properties of the solution(s). For instance, cross-validation techniques would tell you the extent to which the competing partitions are misclassified -- the higher the rate of misclassification, the less desireable the solution. Another consideration is the utility of the results over time -- which solution is more "actionable?" – Mike Hunter Jun 28 '16 at 13:50
  • 1
    Related, unclear if duplicate: http://stats.stackexchange.com/questions/15548/validation-of-clustering-results – Sycorax Jun 28 '16 at 14:37
  • Pretty sure this question is about hierarchical clustering and not classification. – Karolis Koncevičius Oct 31 '17 at 01:02

1 Answers1

1

As @DJohnson stated, "ground truth" is difficult in such situations - so there is no simple "absolute reference" to compare to. But you could directly compare 2 results that were composed over multiple classes if this helps in your case.

I assume you are able to obtain a confusion matrix for both results:

  • For each result, the true/false positive/negative rate, AUC, EER, etc., would be calculated for each individual class. Consequently, this leaves you will all those rates for all individual classes.
  • For comparing your results: look at the distribution of those values, over both results. The distribution gives you an idea how well the results are for the classes within (e.g. average performance + performance spread). For direct comparison, you could compare the numeric values of mean/median and sd/mad performance - but looking at and comparing e.g. two boxplots generated for each results might be easier.
geekoverdose
  • 3,691
  • 2
  • 14
  • 27
  • Hey, the 1st results was from manually curation of biologist, so they only have a list of groups that biologist think should put together, so true positive is clear, but true negative is kind of non-clear. i might should consider any combination of one item from group i with group j (i!=j), but based on their classification, each group may have various numbers.(some may have 5 items, some 3 items, some 2 items...) – user1830108 Jun 28 '16 at 15:24
  • When i used computational method results to compare a manually curation, again, true positive is clear, i.c. the items puts into a the same group as man did one. what should i consider true negative? false negative etc.... – user1830108 Jun 28 '16 at 15:24
  • @user1830108 I struggle to imagine how this data looks like. Could you show a small example in your question? If I understand correctly that you don't have certain combinations of features in your data which are realistic/important and which you would consider being negative, please reflect this is the sample too. – geekoverdose Jun 30 '16 at 18:08