What algorithm does ward.D in hclust() implement if it is not Ward's criterion?

Question

The one used by option "ward.D" (equivalent to the only Ward option "ward" in R versions <= 3.0.3) does not implement Ward's (1963) clustering criterion, whereas option "ward.D2" implements that criterion (Murtagh and Legendre 2014).

(http://stat.ethz.ch/R-manual/R-patched/library/stats/html/hclust.html)

Apparently ward.D does not implement Ward's criterion properly. Nonetheless it seems to do a good job regarding the clusterings it produces. What does method="ward.D" implement if it is not Ward's criterion?

References

Murtagh, F., & Legendre, P. (2014). Ward’s hierarchical agglomerative clustering method: which algorithms implement Ward’s criterion?. Journal of Classification, 31(3), 274-295.

Does the Murthagh and Legendre paper say anything about this? — cbeleites unhappy with SX, Jul 30 '14 at 08:04
First thing a search turns up for me is the pdf of the manuscript at u montreal!? — cbeleites unhappy with SX, Jul 30 '14 at 08:24
are you kidding me? :) I have no access to the paper and even if I had, I am pretty positive that there are lot of people around on this site who could answer the question better than me. but thanks for your valuable input. — Raffael, Jul 30 '14 at 08:33
Let us [continue this discussion in chat](http://chat.stackexchange.com/rooms/16062/discussion-between-cbeleites-and-raffael). — cbeleites unhappy with SX, Jul 30 '14 at 08:46

score 13 · Answer 1 · answered Jul 30 '14 at 09:56

The only difference between ward.D & ward.D2 is the input parameter.

hclust(dist(x)^2,method="ward.D") ~ hclust(dist(x)^2,method="ward")

which are equivalent to: hclust(dist(x),method="ward.D2")

You can find the reserach paper : Ward’s Hierarchical Clustering Method: Clustering Criterion and Agglomerative Algorithm

The Ward2 criterion values are “on a scale of distances” whereas the Ward1 criterion values are “on a scale of distances squared”.

I prefer this answer as the other implies that ward.D is wrong, it isn't. Just different. — Chris, Jun 13 '17 at 11:01

score 12 · Accepted Answer · edited Jun 11 '20 at 14:32

12

The relevant manuscript is here.

The difference between ward.D and ward.D2 is the difference between the two clustering criteria that in the manuscript are called Ward1 and Ward2.

It basically boils down to the fact that the Ward algorithm is directly correctly implemented in just Ward2 (ward.D2), but Ward1 (ward.D) can also be used, if the Euclidean distances (from dist()) are squared before inputing them to the hclust() using the ward.D as the method.

For example, SPSS also implements Ward1, but warn the users that distances should be squared to obtain the Ward criterion. In such sense implementation of ward.D is not deprecated, and nonetheless it might be a good idea to retain it for backward compatibility.

edited Jun 11 '20 at 14:32

Community

1

answered Jul 30 '14 at 09:06

JTT

1,295
9
13

2

From the paper you link to it follows not `Ward algorithm is directly correctly implemented in just Ward2`, but rather that: (1) to get correct results with both implementations, use squared Euclidean distances with Ward1 and nonsquared Euclidean distances with Ward2; (2) to further make their output dendrograms comparable (identical), apply sq. root to fusion levels after Ward1 or square fusion levels after Ward2, before constructing dendrogram. – ttnphns Jul 30 '14 at 10:06
You are right, of course. Thanks for the clarification. What I meant by "directly correctly implemented" is that no further steps, such as taking a square root of the heights, are needed to arrive to the correct result with the ward.D2 method. – JTT Jul 30 '14 at 10:29
1

The tiny nuance here is that with Ward's method, it is _not_ defined what is "correct" or true fusion levels presentation - whether they should be plotted "nonsquared" or "squared". The cause of of the indecision is that fusion levels in Ward are not _distances_, they are _incremental_ dispersions. – ttnphns Jul 30 '14 at 10:39

score 6 · Answer 3 · edited Mar 15 '16 at 18:09

I came across the research paper that corresponds to the objective function that is being optimized by "Ward1 (ward.D)": Hierarchical Clustering via Joint Between-Within Distances: Extending Ward's Minimum Variance Method. It turns out that R's implementation of "Ward1 (ward.D)" is equivalent to minimizing the energy distance between cluster groups.

2.1 Cluster $e$-distance and Objective Function

Let $A = \{a_1, \ldots, a_{n_1}\}$ and $B = \{b_1, \ldots, b_{n_2}\}$ be nonempty subsets of $\mathbb R^d$. Define the between-within, or $e$-distance $e(A, B)$, between $A$ and $B$ as \begin{align} e(A, B) = &\frac{n_1n_2}{n_1+n_2}\bigg(\frac{2}{n_1n_2}\sum_{i=1}^{n_1} \sum_{j=1}^{n_2} \|a_i-b_j\| \\ &- \frac{1}{n_1^2}\sum_{i=1}^{n_1}\sum_{j=1}^{n_1}\|a_i-a_j\| - \frac{1}{n_2^2}\sum_{i=1}^{n_2}\sum_{j=1}^{n_2}\|b_i-b_j\|\bigg). \tag{1} \end{align}

Are you sure that that is the correct interpretation of the contents of that paper? It seems to me that $e^{(2)}$ corresponds to `ward.D2`, but I don't think it is stated anywhere that $e^{(1)}$ corresponds to `ward.D1`. In fact, on page 161–162, it is stated that for $0 — Jonas Dahlbæk, Oct 24 '18 at 13:07

What algorithm does ward.D in hclust() implement if it is not Ward's criterion?

3 Answers3

Linked