1

I created dummy variables (binary data) from categorical variables where I want to partition N subjects into multiple classes by some clustering method. I created a Jaccard similarity index matrix for all subjects, thus having N by N similarity matrix.

My question is, if it is OK to apply a hierarchical clustering using eucledian distance measure on the Jaccard similarity index matrix.

The result looks very good and valid. In fact much better than when I use the jaccard dissimilarity (1-Jaccard index) matrix. I want to make sure that I am not creating mathematical nonsense.

ttnphns
  • 51,648
  • 40
  • 253
  • 462
dmeu
  • 280
  • 2
  • 10
  • Well, yes. The actual clustering algorithm is agnostic of the metric used to construct the distance/similarity matrix. – Digio Sep 05 '17 at 14:31
  • `eucledian distance measure on the Jaccard similarity index matrix` This is misty. Jaccard similarity is a proximity measure. Euclidean distance is another proximity measure. Maybe you meant `or`, not `on` in that sentence? – ttnphns Sep 05 '17 at 15:33
  • Note that your data are initially nominal. I.e. they are _dummy_ binary, not simply binary. An overview of measures to use with nominal attributes is here https://stats.stackexchange.com/q/55798/3277. – ttnphns Sep 05 '17 at 15:36
  • Hi, thanks @ttnphns. I will adapt the wording, you are right. And no, my question exactly is if it is valid to do `on` the jaccard index. I will try the Dice algorithm and check the performance. My reasoning would be that I create a continuous data set from nominal data (jaccard,dice) which then can be used with e.g. euclidean distance to perform a hierarchical clustering. – dmeu Sep 06 '17 at 07:43
  • Still `using eucledian distance measure on the Jaccard similarity index matrix` is not clear. It sounds as if you are going to see the jaccard matrix as some dataset and compute euclidean distances between its rows?? – ttnphns Sep 06 '17 at 08:41
  • Ok, should I rather just use the jaccard similarity index as dis(similarity) metric for the hierarchical clustering? – dmeu Sep 06 '17 at 11:12

0 Answers0