What are the best metrics for examining the separateness of clusters?

Question

I'm working on a new clustering approach. My algorithm has not any intra-class comparison; it only uses some inter-class comparison to decide whether the iteration continues. This lack of criterion causes my algorithm to stop very soon ! Therefore, I'm looking for a robust metric to use it as a criterion for evaluating the intra-class heterogeneity of clusters based on their neighbors. Is there such a metric?

Again, the most common questions: What is your data, is it real-values, structured, etc...? Also, if you have inter-class comparison (point-by-point) you commenly also have intra-class comparison. I am afraid we need a bit more details here. — pAt84, Feb 14 '16 at 14:56
@pAt84 thank you so much for the reply. My data are the images (pixel values) which are real values (between 0-255). Nevertheless, my algorithm is not a pure clustering algorithm. These kinds of algorithm in image processing are known as segmentation algorithm, with which homogeneous image objects are grouped together and then these image segments are used for classification. (Continued...) — Federico, Feb 14 '16 at 15:32
To do this, my algorithm uses three inter-class homogeneity criteria for checking the continuation of iteration for generating image objects. Defining a intra-class comparison criterion for a segmentation algorithm is usually not necessary, but having such a criterion could enhance the capability of the algorithm. Thank you — Federico, Feb 14 '16 at 15:32
Are you speaking of some internal clustering criterion ([pt 3](http://stats.stackexchange.com/a/195481/3277))? If yes, they are so many! Neither one is "robust" in the sense "universal". — ttnphns, Feb 14 '16 at 15:35
@ttnphns thank you so much for the help. Yes, this criterion is more or less close to that I desire. As I know, some indexes such as " Dunn index " is also of this category. Am I right? Do you think these indexes could of some helps for my case? Thank you — Federico, Feb 14 '16 at 16:35
Yes, Dunn's index is among the many. You have to consider a number of most prominant of them (their formulas) and to select one which you think will do. Maybe Silhouette index is what you want. It's you who decides. — ttnphns, Feb 14 '16 at 16:39
@ttnphns I'm so grateful for valuable suggestion. I will test the Silhouette index to see its effectiveness in my application. — Federico, Feb 14 '16 at 16:47
I did actually not know these, nice work ttnphns. :) May I add a question: one of these measures is eventually going to work in keeping the algorithm running -- the "why" is a different question. But who says these clusters may not all lie close to each other or on top of each other in the space that matters for a proper clustering? — pAt84, Feb 14 '16 at 17:25
Did ttnphns ideas help you? Then it should be turned into an answer and rewarded. Or did they not and you still have the same problem? — pAt84, Feb 15 '16 at 10:58

pAt84 · Accepted Answer · 2016-02-14T16:58:33.063

1

You are talking about semantic segmentation, I suppose. Don't worry, I know a thing or two about computer vision. ;)

Personally I would not use or build such clustering algorithms anymore. Unsupervised Deep Learning with CNNs or stacks of RBMs do a much better job these days. I have used this for image segmentation and it has shown much more promising results than coming up with your own clustering algorithm.

I assume you actually segmented the whole image in different parts and not only one object (which would make this easy) but I will stick with the first explaination. There is not really all that much you can do here. You could use a bag-of-words representation for each image (how many pixels fall into each cluster) and compare those (k-means comes to mind). [Won't work, see discussion below]

What also comes to mind: If you know which segment is which, i.e. you can achieve a 1-to-1 matching of all segments of one image to another (bijective relationship), you could compare, e.g. the color values in the patches. If you do not know anything about the relationship, then this could still be done by finding the closest matches of segmentation patches. However, this usually ends up in some Simulated Annealing optimization problem and will be very slow. If the segments were already classified into their meaningful categories, there might be more you can do but I doubt they are.

If you only segmented a single object in each image and it is centered, then just devide the area of intersection by the area of the union (pascal measure.).

edited Feb 14 '16 at 16:58

answered Feb 14 '16 at 15:57

pAt84

551
3
9

Thank you so much for your concise help. It seems you know more than me about image segmentation. :) . I segment an image into several objects. That is, whole the image is segmented. I have not any prior knowledge about the image objects and their meaning as a real-world object. The fact is that if the process in my algorithm continues for more iterations, some meaningless, small objects merge together and create meaningful ones. For this reason, I think I should have a metric to tell me how much these objects differ from each other at each iteration. – Federico Feb 14 '16 at 16:30
Then testing the bag-of-words solution is your best shot, I think. Which segmentation methods do you use? I assume superpixels are in there? – pAt84 Feb 14 '16 at 16:32
@ pAt84 thanks again. I will scrutinize bag-of-words solution. My algorithm is based on Watershed segmentation, but I have modified and enhanced it completely. No, I'm not doing super-pixel analysis. – Federico Feb 14 '16 at 16:45
Actually the bag-of-words solution won't do the trick either because you need to know which segment in one image belongs to which segment in another. The only way you can really do this is by classifying them -- given one patch search the closest patches in all images (RGB distance or something like this). Assume this is your positive class and all others are the negative class (one~vs~all). Then use an SVM to learn how to distinguish. When done label your patches and from there you can explore the clustering algorithms we discussed earlier. – pAt84 Feb 14 '16 at 16:53
Thank you. No, I'm not going to match the segments of two images. This comparison is not made at all in my algorithm. I just want to know that how much each created image object is separate (or heterogeneous) from its neighbors. If this separateness is significance, and the inter-class criteria don't meet, the iteration stops, unless it continues and the conditions will be considered again. – Federico Feb 14 '16 at 16:53
As long as you do not want to match the patches, I doubt a method even exists. You might have to fallback on other image processing than semantic segmentation. How do you compute your inter-class criteria? Maybe there is way in through that. – pAt84 Feb 14 '16 at 16:57
Thank you. These criteria are some expressions made according to the variance, entropy, and some model-based concepts in image processing and object detection. At each iteration, pixels merge together based on these criteria. Gradually, these pixels create image objects at subsequent iterations, and then objects merge together or remaining pixels if the criteria meet. As it is clear, there is no any criteria for intra-class heterogeneity examination. – Federico Feb 14 '16 at 17:05
Well, probabalistic models also won't help, since at some point you need a matching there as well. The only real distance measure, based on the segments, you could compute is a comparison of the form of the segments in each image. Maybe there will be sky in one image and hence a big segment on the top of the image c.f. an inside scene where you do not have this. But this would be a very very weak criteria. Is there any knowledge you can extract from the segmentation algorithm itself? Convergence rates, etc.? It might behave differently for some clusters. – pAt84 Feb 14 '16 at 17:12
Thank you for the time you took to help me. I'm so grateful. It's a good idea to consider the convergence rates or other change-tracing measures. It could finally help me. – Federico Feb 15 '16 at 03:16

What are the best metrics for examining the separateness of clusters?

1 Answers1