3

I have read some threads on this website saying that it is not OK to use Gower's dissimilarity matrix for Ward's clustering algorithm.

I have mixed type variables, first I had a dissimilarity matrix with Gower's formula in R (daisy function). I had a distance matrix, elements are between 0-1. So at the same time my data are standardized.

Then I used Ward's technique with hclust function.

I had a good dendrogram. So, I used Gower to standardize my data set and than I used Wards method, why this way is not OK?

ttnphns
  • 51,648
  • 40
  • 253
  • 462
Emrah Bilgiç
  • 289
  • 2
  • 7
  • 14
  • 1
    Your question is a bit dim. What is the relation between Gower coefficient and "standardization"? What standardization and why? – ttnphns Jan 08 '15 at 15:50
  • Sorry @ttnphns, I wanted to say; if you have a data with different variables measured with different scales like binary variables (0-1), interval variables(0,7-0,8)... You need to standardize before calculating distance matrix. In R, daisy fuction with Gower's metric calculating the distance matrix at the same time standardizing. In the final you have a distance matrix, all elements are between 0-1. Now you can select any clustering algorithm. So why not Ward's algorithm? – Emrah Bilgiç Jan 08 '15 at 16:09
  • 1
    Ward's method calculates deviations from geometric centroids (i.e. the means). It is possible to do correctly from the distance matrix if the distances are (squared) Euclidean ones, only. Gower (dis)similarity isn't Euclidean distance by definition! – ttnphns Jan 08 '15 at 16:24
  • Thank you, so you mean it is not correct Ward's algorithm with Gower's dissimilarity? I am suprised that I used, single-average-complete and couldn't have good dendograms, Wards algorithms dendogram seems, I had really good clusters. – Emrah Bilgiç Jan 08 '15 at 16:43
  • 1
    You should _never_ choose among the methods of hierarchical clustering by the view of their dendrogram. In particular, Ward's dendrogram is always good-looking, but this may mislead. Please read attentively [this](http://stats.stackexchange.com/a/63549/3277) answer. – ttnphns Jan 08 '15 at 16:54
  • And yes, it is improper to use Ward/centroid/median methods with Gower (dis)similarity. – ttnphns Jan 08 '15 at 16:56
  • Thank you, I have also read your comments, may be this works for my case; you said: However, geometrically, a concrete matrix of Gower dissimilarity could happen to be close to euclidean distance, and then you may be licensed using Ward... – Emrah Bilgiç Jan 08 '15 at 17:41
  • If you can't quit the idea to use Ward method (I really wonder why?), you could first perform multidimensional scaling to create data (coordinates) out of the Gower coefficients matrix. Then compute Euclidean distances based on those coordinates. – ttnphns Jan 08 '15 at 19:01

0 Answers0