why is kMeans using sum of squares instead of distance?

Asked Aug 19 '21 at 12:13

Active Aug 19 '21 at 12:13

Viewed 40 times

The question is quite precise: Why is kMeans using the sum of squares (of the distances) and not the distances themselves?

asked Aug 19 '21 at 12:13

Ben

Research the Pythagorean Theorem. – whuber Aug 19 '21 at 15:53
according to https://learningjourney.io/2019/11/28/how-k-means-clustering-works/ they square the legs, apply the root and get the distance. So, from my understanding, the distance is finally used - and not the sum of squares. – Ben Aug 19 '21 at 17:55
I am not sure you can say kMeans uses the sum of the squares of the distances or the distances. And if you want to know whether point $X$ is closer to point $A$ or point $B$, it makes little difference whether you compare $\sum (x_i-a_i)^2$ to $\sum (x_i-b_i)^2$ or compare $\sqrt{\sum (x_i-a_i)^2}$ to $\sqrt{\sum (x_i-b_i)^2}$ – Henry Aug 19 '21 at 20:42
Should be of help [Why does k-means clustering algorithm use only Euclidean distance metric?](https://stats.stackexchange.com/questions/81481/why-does-k-means-clustering-algorithm-use-only-euclidean-distance-metric). – user2974951 Aug 20 '21 at 05:39
@Henry Well, but there is a difference. – Ben Aug 21 '21 at 10:19
@user2974951 Thanks, I saw this already but I miss the clearly statement what is going on. Maybe I miss the tree for the forests, there. Is it distance or their squares and in each case, why? Someone there also brings up, it's neither nor but the errors.. – Ben Aug 21 '21 at 10:20

0 Answers0