0

K-means computes cluster centroids differently for each distance metric. I don't know why the way of computing the centroid is dependent of the distance measure.

I don't know how we compute the centroid for manhattan distance and its difference with the computing the centroid for euclidean distance?

curiosus
  • 133
  • 10
  • This question is dubious because you don't say if you mean the distance between data poinrs or the distance between a point and a centroid. Please read a related post https://stats.stackexchange.com/q/81481/3277 – ttnphns Nov 14 '21 at 19:12

2 Answers2

3

K-means does not work with arbitrary distances, and was originally only formulated with squared errors.

It is for squared Euclidean distance and Bergman divergences.

K-means does not minimize Euclidean distances! It will run, and find an okayish (not obviously wrong) solution; but not even a local optimum. So the simple answer is: don't rely on k-means for other distances. It so not something you can easily fix.

Has QUIT--Anony-Mousse
  • 39,639
  • 7
  • 61
  • 96
  • You have an example here http://www.ece.northwestern.edu/local-apps/matlabhelp/toolbox/stats/kmeans.html. It says that kmeans computes centroid clusters differently for the different supported distance measures and it lists the way of computation centroid or center for each distance measure. – curiosus Oct 24 '18 at 08:32
  • 1
    Well, is not kmeans but a method known as k*medians* if you use the component wise median. And that doesn't just work with some of the *fast* algorithms for kmeans. – Has QUIT--Anony-Mousse Oct 24 '18 at 20:39
  • 1
    For Euclidean distance, you "just" need to solve the [Weber problem](https://en.wikipedia.org/wiki/Weber_problem) – Has QUIT--Anony-Mousse Oct 24 '18 at 20:40
  • Can you explain more in details the relation between weber problem and k-means and in what that answers the question about why the way of center computation is dependent to the distance measure? – curiosus Oct 24 '18 at 20:47
  • 1
    The Weber problem is the *difficult* problem of finding the most central point for Euclidean distance. That would be what you would need to use *instead* of the mean if you want to minimize Euclidean distances, essentially a k-weberpoint clustering. – Has QUIT--Anony-Mousse Oct 25 '18 at 01:18
0

@curiosus you are right, according to the definition of Kmeans for the Euclidean distance, the centroid whose coordinates are the average of the coordinates of the points of a cluster are simply a local minimum of the inertia, that is the value for which the derivatives of inertia with respect to the centroid are zero. The global minimum is another story.

bladerun
  • 1
  • 1