Is there an advantage to squaring dissimilarities when using Ward clustering?

Question

Is there a reason to prefer squaring or not squaring the dissimilarities when clustering with Ward's method?

The question is motivated by the following statement in the documentation for R's hclust() function:

Two different algorithms are found in the literature for Ward clustering. The one used by option "ward.D" (equivalent to the only Ward option "ward" in R versions <= 3.0.3) does not implement Ward's (1963) clustering criterion, whereas option "ward.D2" implements that criterion (Murtagh and Legendre 2013). With the latter, the dissimilarities are squared before cluster updating.

Does squaring improve the algorithm?

Uhm. Unless you show the results of the two methods, along with the input matrix, that question would look much as purely `R` question. — ttnphns, Apr 30 '14 at 13:31

score 4 · Answer 1 · edited Jun 30 '14 at 18:12

From the Conclusion of Murtaugh, F. & Legendre, P. (2011). Ward's Hierarchical Clustering Method: Clustering Criterion and Agglomerative Algorithm, ArXive:1111.6285v2 (pdf):

Two algorithms, Ward1 and Ward2...When applied to the same distance matrix D, they produce different results. This article has shown that when they are applied to the same dissimilarity matrix D, only Ward2 minimizes the Ward clustering criterion and produces the Ward method. The Ward1 and Ward2 algorithms can be made to optimize the same criterion and produce the same clustering topology by using Ward1 with D-squared and Ward2 with D.

For example, hclust(dist(x)^2,method="ward") is equivalent to hclust(dist(x),method="ward.D2").

That doesn't answer the question on quality... – Chris Jun 13 '17 at 11:06 — Chris, Jun 13 '17 at 11:06

score 1 · Answer 2 · answered May 01 '14 at 10:56

1

Judging from the explanation, ward in R was first implemented incorrectly.

Only in recent versions, a corrected version of ward linkage was added, as ward.D2. So if you want to use ward linkage, use ward.D2.

answered May 01 '14 at 10:56

Has QUIT--Anony-Mousse

39,639
7
61
96

Is there an advantage to squaring dissimilarities when using Ward clustering?

2 Answers2