Specifying the number of clusters in nearest neighbor clustering

Question

I've got a distance matrix between examples. I want to cluster them into m clusters with a nearest neighbor algorithm which works like this:

1. Set i = 1 and k = 1. Assign example x_1 to cluster C_1.
2. Set i = i + 1. Find nearest neighbour of x_i
   among the patterns already assigned to clusters.
   Let d_n  denote the distance from x_i to its nearest neighbour.
   Suppose the nearest neighbour is in cluster n.
3. If d_n less than or equal to t then assign x_i to C_n where t is the 
   threshold specified by the user. Otherwise set k = k+1 and assign x_i  to a      
   new cluster C_k.

How could I adapt this algorithm so I could specify how many clusters I want?

Is anybody aware of an existing R implementation of nearest neighbour clustering?

In this algorithm you cannot specify the number of clusters, you can change them varying parameter 't'. For R implementations just Google it, there is at least one. — Dmitry Laptev, Aug 19 '12 at 21:58
In 3. I think you mean "less than equal" instead of "greater than or equal", do you ? — mlwida, Aug 20 '12 at 08:09
It is not a homework. Oops, it was a mistake, I mean "less than equal". — Uros K, Aug 20 '12 at 08:16

score 2 · Accepted Answer · answered Aug 20 '12 at 08:21

As Dmitry Laptev already said correctly, the threshold t is determining the number of clusters indirectly. Using your algorithm there is no way to determine the number of clusters beforehand while still producing meaningful results.

As a more convenient bottom-up agglomerative nearest neighbor clustering approach you may want to take a look at Single Linkage, which works in a comparable (albeit certainly not equivalent) way. Single Linkage is implemented in R in hclust in the package stats

If you liked this approach, you should also take a look at the other linkage algorithms, especially Ward's method, which tends to deliver better results.

Thank you. I looked into single linkage hierarhical clustering and it is somehow similar to NN method. — Uros K, Aug 22 '12 at 19:04

score 0 · Answer 2 · edited Aug 20 '12 at 20:12

0

A quick google search produces k-Nearest Neighbour Classification and knn (weighted k-nearest neighbors

edited Aug 20 '12 at 20:12

Michael R. Chernick

39,640
28
74
143

answered Aug 19 '12 at 22:56

AGS

119
3

I think these two are not meant for clustering (they don't work with distance matrix). – Uros K Aug 20 '12 at 08:32
@genesiss the first link not, but the second one. From the description of the method specClust it uses [spectral clustering](http://www.kyb.mpg.de/fileadmin/user_upload/files/publications/attachments/Luxburg07_tutorial_4488%5b0%5d.pdf), but yes, a precomputed distance matrix cannot be supplied as far as I see. – mlwida Aug 20 '12 at 09:02
1

Michael, for finding R solutions use http://www.rseek.org instead of Google. One of the first hits when searching on "cluster" takes you to the [Cluster task view](http://cran.r-project.org/web/views/Cluster.html) listing approximately 100(!) packages. – whuber Aug 20 '12 at 20:15

Specifying the number of clusters in nearest neighbor clustering

2 Answers2

Linked