Is there a reason to prefer squaring or not squaring the dissimilarities when clustering with Ward's method?
The question is motivated by the following statement in the documentation for R's hclust()
function:
Two different algorithms are found in the literature for Ward clustering. The one used by option "
ward.D
" (equivalent to the only Ward option "ward
" in R versions <= 3.0.3) does not implement Ward's (1963) clustering criterion, whereas option "ward.D2
" implements that criterion (Murtagh and Legendre 2013). With the latter, the dissimilarities are squared before cluster updating.
Does squaring improve the algorithm?