0

I am working on a clustering problem for which I have to manually choose the number of clusters. I have a visualization tool that helps me decide whether the clusters are good. In order to automatically select the number of clusters, I tried the following: compute the silhouette associated to several clusterings and choose the configuration giving the best score. My clustering algorithm uses a correlation $p \in [0,1]$ to regroup the elements. Since silhouette requires a dissimilarity, I converted my similarity into a dissimilarity. I tried different formulas for this conversion: $1-p$, $1/(1+p)$, or $1/(1+\exp(1+p^2))$.

Every time, the clustering seems nice (visually), but the silhouette is poor (around 0).

I think the reason might be because the conversion formula I am using is not appropriate, but I don't have any idea of what a good similarity to dissimilarity conversion could be. Any idea?

QuantIbex
  • 3,880
  • 1
  • 24
  • 42
bigTree
  • 739
  • 1
  • 9
  • 21
  • 1
    In my answer to the above linked question, I mention two most natural convertions (note that $r$ you use is, in fact, cosine). – ttnphns Mar 31 '14 at 09:27
  • 1
    I assess what you were doing with all your formulas is behaving a queer way, trying to "train" a dissimilarity to concur with your eye strategy of cluster selction. You should know that eye is apophenic and not a very good tool, generally. The real reason for you having low Silhouette is that you hardly have clusters in your data. – ttnphns Mar 31 '14 at 09:33
  • 1
    You mean that if the silhouette is low, then there are no clusters? I agree with you about the fact that relying on the eye is not good, but in my case, an 'expert' can see different groups in the data and I need to distinguish these groups (even though I had a first hierarchical clustering done and I find it hard to be more precise as I am trying to be) – bigTree Mar 31 '14 at 09:46
  • 1
    Eye is a shaky, multi-objective clustering criterion, different for different people. Silhouette is a single-objective one. You certainly have clusters perceived by eye, these are not necessarily "good" from the Silhouette's poit of view. – ttnphns Mar 31 '14 at 09:53
  • I see. Therefore, if I want to define the clusters perceived by the 'expert' I will need to find another metric that captures what he sees? – bigTree Mar 31 '14 at 09:54
  • Or no metric at all. Use the result, and when it works, it was good. Otherwise, try a different clustering. Don't rely on a metric. – Has QUIT--Anony-Mousse Mar 31 '14 at 11:31

0 Answers0