3

The Ward's method is taking distance as how much the sum of squares will increase when we merge them.

$d(u,v) = \frac{|u||v|}{|u|+|v|}{|m_u-m_v|}^2$

Please refer to Page 3 of link below.

https://www.google.co.kr/url?sa=t&rct=j&q=&esrc=s&source=web&cd=2&cad=rja&uact=8&ved=0ahUKEwjz06z524bXAhVCurwKHUfvD5wQFgg0MAE&url=http%3A%2F%2Fwww.stat.cmu.edu%2F~cshalizi%2F350%2Flectures%2F08%2Flecture-08.pdf&usg=AOvVaw0hrIiXitAkxr5Ciur0hQkf

But, from Scipy implementation from github. https://github.com/scipy/scipy/blob/master/scipy/cluster/hierarchy.py

The distance is derived as below.

$d(u,v) = \sqrt{\frac{|v|+|s|}{T}d(v,s)^2 + \frac{|v|+|t|}{T}d(v,t)^2- \frac{|v|}{T}d(s,t)^2}$

I am wondering what happend between these two equations. They even do not result same value for the same merge.

I tested on merging [(0,0),(0,2)] and [(2,0)]. Upper one gives me value of 3.333... Bottom one gives me values of 2.581988...

Why they have difference?

DongukJu
  • 593
  • 1
  • 5
  • 5

1 Answers1

4

Rather than reverse engineering the code, also check for references and literature. These algorithms often long predate sklearn.

Even Wikipedia has this equation, known as Lance Williams equations: https://en.wikipedia.org/wiki/Ward%27s_method

If I'm not mistaken, a subtle difference is that it works on the squared distances and uses the increase in variance, not the resulting variance.

The two means of the merged clusters are (0,1) and (2,0), so the first equation yields ⅔(2²+1²)=10/3=3⅓ (same as you got).

For the Lance Williams result, we need the squared distance first, which are 4 respectively 8. We then get ⅔.4+⅔.8-⅓.4=20/3. As mentioned on the Wikipedia talk page, there is the constant factor of 2 involved here.

Now sqrt(20/3) is just the value you got. You had one distance squared, and one non-squared, and the factor of two was missing.

Has QUIT--Anony-Mousse
  • 39,639
  • 7
  • 61
  • 96
  • I really appreciate for your kind answer! It helps me a lot. I got that equations are in some relation by factor 2. But, can we derive this? – DongukJu Oct 24 '17 at 03:53
  • That is why I suggest to do a literature review. Someone (maybe Lance & William?) probably published the derivation. – Has QUIT--Anony-Mousse Oct 24 '17 at 07:17
  • The factor of 2 probably comes from the difference of "sum over all i and j" with and without order (i.e., sum over all i – Has QUIT--Anony-Mousse Oct 24 '17 at 07:19
  • Thank you for your kind explanation. That seems where the factor 2 comes from. Just one more thing, can you recommend me some reference in literature? – DongukJu Oct 25 '17 at 00:05
  • HAC and Ward's method are so old, there are thousands of publications on it. It's pretty interesting to study the early publications. But I can't point out particular ones. – Has QUIT--Anony-Mousse Oct 25 '17 at 05:55