Dummy data clustering: OK to apply Euclidian distance hierarchical clustering on Jaccard similarity matrix?

Question

I created dummy variables (binary data) from categorical variables where I want to partition N subjects into multiple classes by some clustering method. I created a Jaccard similarity index matrix for all subjects, thus having N by N similarity matrix.

My question is, if it is OK to apply a hierarchical clustering using eucledian distance measure on the Jaccard similarity index matrix.

The result looks very good and valid. In fact much better than when I use the jaccard dissimilarity (1-Jaccard index) matrix. I want to make sure that I am not creating mathematical nonsense.

Well, yes. The actual clustering algorithm is agnostic of the metric used to construct the distance/similarity matrix. — Digio, Sep 05 '17 at 14:31
`eucledian distance measure on the Jaccard similarity index matrix` This is misty. Jaccard similarity is a proximity measure. Euclidean distance is another proximity measure. Maybe you meant `or`, not `on` in that sentence? — ttnphns, Sep 05 '17 at 15:33
Note that your data are initially nominal. I.e. they are _dummy_ binary, not simply binary. An overview of measures to use with nominal attributes is here https://stats.stackexchange.com/q/55798/3277. — ttnphns, Sep 05 '17 at 15:36
Hi, thanks @ttnphns. I will adapt the wording, you are right. And no, my question exactly is if it is valid to do `on` the jaccard index. I will try the Dice algorithm and check the performance. My reasoning would be that I create a continuous data set from nominal data (jaccard,dice) which then can be used with e.g. euclidean distance to perform a hierarchical clustering. — dmeu, Sep 06 '17 at 07:43
Still `using eucledian distance measure on the Jaccard similarity index matrix` is not clear. It sounds as if you are going to see the jaccard matrix as some dataset and compute euclidean distances between its rows?? — ttnphns, Sep 06 '17 at 08:41
Ok, should I rather just use the jaccard similarity index as dis(similarity) metric for the hierarchical clustering? — dmeu, Sep 06 '17 at 11:12

Dummy data clustering: OK to apply Euclidian distance hierarchical clustering on Jaccard similarity matrix?

0 Answers0