1

The classical version of k-means uses the Euclidean distance in the first step, and the arithmetic mean (the value center) in the second step. Is k-means generalizable to other distances and other operations than the arithmetic mean so that it converges?

There are special cases like k-median, where the distance is city-block, and the center's value is the median of the cluster points.

gunes
  • 49,700
  • 3
  • 39
  • 75
  • Yes, you can use any distance metric that you prefer. Although you should be able to explain why you chose that particular metric. – user2974951 Sep 02 '20 at 12:54
  • Is it possible to choose any distance while setting the arithmetic mean as a choice in the second step? – pedro colombino Sep 02 '20 at 12:57
  • Just to be clear, you can do it, however you will probably have to implement it yourself. I don't know of any which will let you choose. – user2974951 Sep 02 '20 at 13:10
  • I am looking for a general theoretical framework of the applicability of k-means over any distance while allowing k-means to converge. – pedro colombino Sep 02 '20 at 13:13
  • Distance between what and what? Between data points or between a data point and a cluster centre? – ttnphns Sep 05 '20 at 11:14
  • This sort of question has bern asked multiple times here. Search the site, for example "k-means distance". – ttnphns Sep 05 '20 at 11:18

1 Answers1

1

No, k-means is for euclidean distance. An alternative similar version of it is called k-medoids where centers are chosen amongst data points, and can be used with arbitrary distance metrics.

gunes
  • 49,700
  • 3
  • 39
  • 75