19

The one used by option "ward.D" (equivalent to the only Ward option "ward" in R versions <= 3.0.3) does not implement Ward's (1963) clustering criterion, whereas option "ward.D2" implements that criterion (Murtagh and Legendre 2014).

(http://stat.ethz.ch/R-manual/R-patched/library/stats/html/hclust.html)

Apparently ward.D does not implement Ward's criterion properly. Nonetheless it seems to do a good job regarding the clusterings it produces. What does method="ward.D" implement if it is not Ward's criterion?

References

Murtagh, F., & Legendre, P. (2014). Ward’s hierarchical agglomerative clustering method: which algorithms implement Ward’s criterion?. Journal of Classification, 31(3), 274-295.

Nick Cox
  • 48,377
  • 8
  • 110
  • 156
Raffael
  • 1,424
  • 3
  • 18
  • 30

3 Answers3

13

The only difference between ward.D & ward.D2 is the input parameter.

hclust(dist(x)^2,method="ward.D") ~ hclust(dist(x)^2,method="ward")

which are equivalent to: hclust(dist(x),method="ward.D2")

You can find the reserach paper : Ward’s Hierarchical Clustering Method: Clustering Criterion and Agglomerative Algorithm

The Ward2 criterion values are “on a scale of distances” whereas the Ward1 criterion values are “on a scale of distances squared”.

Nilesh
  • 341
  • 1
  • 6
  • I prefer this answer as the other implies that ward.D is wrong, it isn't. Just different. – Chris Jun 13 '17 at 11:01
12

The relevant manuscript is here.

The difference between ward.D and ward.D2 is the difference between the two clustering criteria that in the manuscript are called Ward1 and Ward2.

It basically boils down to the fact that the Ward algorithm is directly correctly implemented in just Ward2 (ward.D2), but Ward1 (ward.D) can also be used, if the Euclidean distances (from dist()) are squared before inputing them to the hclust() using the ward.D as the method.

For example, SPSS also implements Ward1, but warn the users that distances should be squared to obtain the Ward criterion. In such sense implementation of ward.D is not deprecated, and nonetheless it might be a good idea to retain it for backward compatibility.      

JTT
  • 1,295
  • 9
  • 13
  • 2
    From the paper you link to it follows not `Ward algorithm is directly correctly implemented in just Ward2`, but rather that: (1) to get correct results with both implementations, use squared Euclidean distances with Ward1 and nonsquared Euclidean distances with Ward2; (2) to further make their output dendrograms comparable (identical), apply sq. root to fusion levels after Ward1 or square fusion levels after Ward2, before constructing dendrogram. – ttnphns Jul 30 '14 at 10:06
  • You are right, of course. Thanks for the clarification. What I meant by "directly correctly implemented" is that no further steps, such as taking a square root of the heights, are needed to arrive to the correct result with the ward.D2 method. – JTT Jul 30 '14 at 10:29
  • 1
    The tiny nuance here is that with Ward's method, it is _not_ defined what is "correct" or true fusion levels presentation - whether they should be plotted "nonsquared" or "squared". The cause of of the indecision is that fusion levels in Ward are not _distances_, they are _incremental_ dispersions. – ttnphns Jul 30 '14 at 10:39
6

I came across the research paper that corresponds to the objective function that is being optimized by "Ward1 (ward.D)": Hierarchical Clustering via Joint Between-Within Distances: Extending Ward's Minimum Variance Method. It turns out that R's implementation of "Ward1 (ward.D)" is equivalent to minimizing the energy distance between cluster groups.

2.1 Cluster $e$-distance and Objective Function

Let $A = \{a_1, \ldots, a_{n_1}\}$ and $B = \{b_1, \ldots, b_{n_2}\}$ be nonempty subsets of $\mathbb R^d$. Define the between-within, or $e$-distance $e(A, B)$, between $A$ and $B$ as \begin{align} e(A, B) = &\frac{n_1n_2}{n_1+n_2}\bigg(\frac{2}{n_1n_2}\sum_{i=1}^{n_1} \sum_{j=1}^{n_2} \|a_i-b_j\| \\ &- \frac{1}{n_1^2}\sum_{i=1}^{n_1}\sum_{j=1}^{n_1}\|a_i-a_j\| - \frac{1}{n_2^2}\sum_{i=1}^{n_2}\sum_{j=1}^{n_2}\|b_i-b_j\|\bigg). \tag{1} \end{align}

josliber
  • 4,097
  • 25
  • 43
user3235207
  • 61
  • 1
  • 1
  • Are you sure that that is the correct interpretation of the contents of that paper? It seems to me that $e^{(2)}$ corresponds to `ward.D2`, but I don't think it is stated anywhere that $e^{(1)}$ corresponds to `ward.D1`. In fact, on page 161–162, it is stated that for $0 – Jonas Dahlbæk Oct 24 '18 at 13:07