2

I've got a distance matrix between examples. I want to cluster them into m clusters with a nearest neighbor algorithm which works like this:

1. Set i = 1 and k = 1. Assign example x_1 to cluster C_1.
2. Set i = i + 1. Find nearest neighbour of x_i
   among the patterns already assigned to clusters.
   Let d_n  denote the distance from x_i to its nearest neighbour.
   Suppose the nearest neighbour is in cluster n.
3. If d_n less than or equal to t then assign x_i to C_n where t is the 
   threshold specified by the user. Otherwise set k = k+1 and assign x_i  to a      
   new cluster C_k.

How could I adapt this algorithm so I could specify how many clusters I want?

Is anybody aware of an existing R implementation of nearest neighbour clustering?

MånsT
  • 10,213
  • 1
  • 46
  • 65
Uros K
  • 467
  • 1
  • 6
  • 9

2 Answers2

2

As Dmitry Laptev already said correctly, the threshold t is determining the number of clusters indirectly. Using your algorithm there is no way to determine the number of clusters beforehand while still producing meaningful results.

As a more convenient bottom-up agglomerative nearest neighbor clustering approach you may want to take a look at Single Linkage, which works in a comparable (albeit certainly not equivalent) way. Single Linkage is implemented in R in hclust in the package stats

If you liked this approach, you should also take a look at the other linkage algorithms, especially Ward's method, which tends to deliver better results.

mlwida
  • 9,922
  • 2
  • 45
  • 74
  • Thank you. I looked into single linkage hierarhical clustering and it is somehow similar to NN method. – Uros K Aug 22 '12 at 19:04
0

A quick google search produces k-Nearest Neighbour Classification and knn (weighted k-nearest neighbors

Michael R. Chernick
  • 39,640
  • 28
  • 74
  • 143
AGS
  • 119
  • 3
  • I think these two are not meant for clustering (they don't work with distance matrix). – Uros K Aug 20 '12 at 08:32
  • @genesiss the first link not, but the second one. From the description of the method specClust it uses [spectral clustering](http://www.kyb.mpg.de/fileadmin/user_upload/files/publications/attachments/Luxburg07_tutorial_4488%5b0%5d.pdf), but yes, a precomputed distance matrix cannot be supplied as far as I see. – mlwida Aug 20 '12 at 09:02
  • 1
    Michael, for finding R solutions use http://www.rseek.org instead of Google. One of the first hits when searching on "cluster" takes you to the [Cluster task view](http://cran.r-project.org/web/views/Cluster.html) listing approximately 100(!) packages. – whuber Aug 20 '12 at 20:15