1

It is well known that the K-means algorithm is well designed for the Euclidean distance (or a minor variation such as the cosine distance). I have been reading the paper "A simple and fast algorithm for K-medoids clustering" (that is cited in Sklearn - python) and It seems that any distance can be used. Am I missing something?

DanielTheRocketMan
  • 1,400
  • 11
  • 20

1 Answers1

2

No, you're not missing anything. Any distance can be used. The definition of k-medoids is for general dissimilarities, and nothing in it would make it necessary to rule anything out.

Note in particular that k-means is called k-means because the mean is the statistic that minimises the within-cluster sum of squares (squared Euclidean distances). That's the k-means objective function, and therefore k-means is specifically connected to the squared Euclidean distance (personally I find it deplorable and confusing that some people in the literature use the term for something more general that doesn't necessarily lead to k means).

In k-medoids, within a cluster you pick the observation that minimises the sum of dissimilarities/distances of the other objects in the same cluster to it, and this can be done whatever the dissimilarity is.

Christian Hennig
  • 10,796
  • 8
  • 35
  • This does not provide an answer to the question. To critique or request clarification from an author, leave a comment below their post. - [From Review](/review/low-quality-posts/264243) – StupidWolf Jun 06 '20 at 23:06
  • Am I not literally answering the question? "Is there any constraint about the choice of the distance?" - "Am I missing something?" That's exactly what I tell them. If this is not an answer. what is? – Christian Hennig Jun 07 '20 at 12:31
  • Hi @Lewian, it was from the review, flagged because of its length etc. So maybe you can explain in a few lines why any distance can be used? It has to do with the algorithm. – StupidWolf Jun 07 '20 at 12:38
  • K-means minimizes the total squared error, hence euclidean or something similar while k-medoids minimizes the sum of dissimilarities between point, so you can use an arbitrary distance measure – StupidWolf Jun 07 '20 at 12:39
  • If you look at the definition of k-medoids, nothing is ruled out, and there is no reason to rule anything out. How can I explain that? There would need to be an explanation why anything in particular should be ruled out, but there is none. – Christian Hennig Jun 07 '20 at 12:40
  • 1
    OK, I'll edit a bit. – Christian Hennig Jun 07 '20 at 12:43