2

I am testing ideas on clustering a particular graph. After testing a set of graph clustering/community detection algorithms I thought about mapping the graph to a vector space and using vector space clustering algorithms, let us say GMM in particular. Multidimensional scaling (MDS) is the algorithm to map a dissimilarity graph (the edges has a measure of distance) into points in an n-dimensional space.

The problem I have is that MDS algorithms tries to minimize the error of the projected distances (more precisely the sum of the squares of the error between the graph distance and the corresponding points distances). Thus MDS will try to get the far away vertices correctly, since they will generate in the larger errors.

But for clustering purposes, getting the far away vertices correctly placed in the space is not relevant. The far away vertices will not be in the same cluster and thus getting their distances correct is not useful. On the other hand, placing the closest vertices correctly is much more useful for the clustering.

Is there an MDS-like algorithm with this property?

Even more desirable, would be to be able to trade the errors for far away vertices for the dimensions of the embedding space. For example, if I think embedding the graph in a 4-D space makes sense, I would like to say to the MDS-like algorithm, place the vertices in a 4-D space and that it can adjust the large distances as necessary to fit the graph perfectly into a 4-D space.

Jacques Wainer
  • 5,032
  • 1
  • 20
  • 32
  • 2
    Modern (iterative) MDS algorithms such as PROXSCAL have an option to weight the input distances by their importance. More important distance will receive an advantage during the fit so will be fit with less error. Now, you could use (the inverse of) the distances themselves as their own weights. Consequently, small distances will get more accurate fit than big distances. This will add to MDS the flavour of clustering while not making it cluster analysis per se. – ttnphns Mar 10 '21 at 12:20

0 Answers0