9

Please give me the reasons. I didn't find any k-medoid example that's calculation is done using Euclidean distance. All examples are made of Manhattan distance for k-medoid.

Gavin Simpson
  • 37,567
  • 5
  • 110
  • 153
MD MOHIBULLA
  • 103
  • 1
  • 1
  • 3
  • 4
    K-medioids is specifically an *alternative* to the k-means algorithm, which uses Euclidean distances. See http://en.wikipedia.org/wiki/K-medoids for instance. Given this, it is hard to make sense of your question. – whuber Apr 07 '15 at 19:38
  • @whuber The page you link to gives a different distinction between *k*-mediods and *k*-means. The former uses mediods whilst the latter uses centroids. The algorithm that the page describes (PAM), states that any valid distance may be used in PAM to measure the distance between the observations and the current mediods, and gives the Euclidean distance as one such choice. As such, I find your comment difficult to "make sense of", at least in terms of you pointing the OP to a resources than seems contradictory to your observation. – Gavin Simpson Apr 08 '15 at 02:59
  • @Gavin I'm afraid I don't follow you: "different" distinction compared to what? I haven't made any distinction at all, but merely characterized one technique as an alternative to the other and gave a one-line summary derived from the Wikipedia article. That article presents a straightforward account of how K-medioids are related to K-means, stating in particular that "K-medioids ... works with an arbitrary matrix of distances between datapoints instead of $l_2$." ("$l_2$" refers to Euclidean distance.) – whuber Apr 08 '15 at 15:35
  • 1
    Right, but k-medoids with Euclidean distance and k-means would be different clustering methods. I don't see the OP mention k-means at all. The Wikipedia page you link to specifically mentions k-medoids, as implemented in the PAM algorithm, as using inter alia Manhattan or Euclidean distances. The OP's question is about why one might use Manhattan distances over Euclidean distance in k-medoids to measure the distance to the current medoids. Hence I don't understand what you think is wrong with the OPs question? – Gavin Simpson Apr 08 '15 at 15:40
  • @whuber or am I missing that k-means minimising the sum of Euclidean distances from cluster points to cluster centroids is equivalent to minimising the Euclidean distance between cluster points and the medoid? – Gavin Simpson Apr 08 '15 at 15:44
  • @Gavin I don't think you're missing anything. I suspect you may be focusing on the distinction between K-mediods' use of a data point and K-means' use of a centroid for the center of each cluster. I haven't said anything about that distinction because I do not see it as a principal distinction between the two procedures. After all, since K-mediods only has a distance matrix to work with, it cannot compute distances to any other constructed points (such as a centroid), so this difference is one that is forced on it. It is not, however, a conceptual or intentional difference. – whuber Apr 08 '15 at 15:56
  • Actually, I read many tutorials and documents about clustering like k-means ,k-medoid etc. In every document I saw the Euclidean distance is used for K -means and Manhattan distance is used for k medoid. So this above question arose in my mind. Now I am working on k-medoid for my thesis. Cordially thanks goes to you two for your valuable discussion #GavinSimpson and #Whuber – MD MOHIBULLA Apr 09 '15 at 05:30

2 Answers2

9

The manhattan distance is based on absolute value distance, as opposed to squared error (read Eclidean) distance. In practice, you should get similar results most of the time. Absolute value distance should give more robust results, whereas Euclidean would be influenced by unusual values.

This is a multivariate technique, and "distance" between two points involves aggregating the distances between each variable. So if two points are close on most variables, but more discrepant on one of them, Euclidean distance will exagerate that discrepancy, whereas Manhattan distance will shrug it off, being more influenced by the closeness of the other variables.

According to Wikipedia, the k-medoid algorithm is not defined for Euclidean distance, which could explain why you have seen no examples of it. Presumably the reason for this is have a robust clustering method.

begin(RantMode)

Thoughtless analysts often throw a whole bag of variables into an analysis, not all of which have much to do with the problem at hand, nor do those analysts wish to take the necessary time to discern which variables matter -- possibly by talking to subject matter experts. Such analysts (who may possibly call themselves Big Data specialists) would naturally favour a technique that was robust with respect to choice of variable. Statisticians, traditionally, go for small amounts of quality data, and thus favour squared error methods with their greater efficiency.

end(RantMode)

Placidia
  • 13,501
  • 6
  • 33
  • 62
  • I have done two examples for k mediod using both Manhattan and Euclidean distance. I have seen that clustering results are same. For this reason,I wanted to know,why Manhattan distance for k-medoid? Thank you very much for giving me above valuable information. – MD MOHIBULLA Apr 07 '15 at 19:01
  • 2
    Can you provide a link to the Wikipedia page that says the "algorithm is not defined for Euclidean distance...". The page @whuber links mentions that the Euclidean distance can be used to measure distances to medoids, and that the method can work with an arbitrary matrix of distances. – Gavin Simpson Apr 08 '15 at 03:10
  • @Gavin is correct. Consider these excerpts from Wikipedia: "It (K-medioids) is more robust to noise and outliers as compared to k-means because it minimizes a sum of pairwise dissimilarities instead of a sum of squared Euclidean distances." In describing the algorithm it goes on to state "'closest' here is defined using any valid distance metric, most commonly **Euclidean distance,** Manhattan distance, ..." (emphasis added). Thus, the motivation for K-medioids is the possibility of robustness (deriving from non-Euclidean metrics), but it nevertheless does work with Euclidean distance. – whuber Apr 08 '15 at 15:40
  • The link is here: http://en.wikipedia.org/wiki/K-medoids. I perhaps overstated the case. The key line is "This method was proposed in 1987[1] for the work with norm and other distances," which suggested that the absolute value was top of mind. I see that, in principle, other distances can be used, and presumably are used. – Placidia Apr 08 '15 at 18:35
  • I would just like to add that I would be suspicious of any clustering analysis that produced wildly different results with L1 and L2 norms -- to me, that would suggest that the data don't really want to be clustered on the basis of that particular collection of variables. – Placidia Apr 08 '15 at 18:37
  • Hmm exactly you are right #whuber.However Euclidean distance is cost effective for a small data set.I got the result last night. Again thanks #GavinSimpson, #whuber, #Placidia. – MD MOHIBULLA Apr 09 '15 at 05:37
1

I don't have enough reputation to comment and this isn't really meritorious of a full answer, but..

Also worth noting is that k-means clustering can be performed using any sort of distance metric (although in practice it is nearly always done with Euclidean distance). If the manhattan distance metric is used in k-means clustering, the algorithm still yields a centroid with the median value for each dimension, rather than the mean value for each dimension as for Euclidean distance.

These clusters will not necessarily be the same clusters as given by k-mediods; thus, the main takeaway is that Manhattan distance metric is not inherently tied to k-mediods.

DerekG
  • 226
  • 1
  • 3