2

In order to cluster users given a user-item binary matrix data, I am planning to first find user's similarity (Jaccard) and then use graph theory to isolate clusters (communities). I need to map the similarity matrix to a binary graph where $e_{ij}=1\; \text{iff}\; \text{sim}(u_i,u_j) > th$, i.e, there is an edge between nodes $i,j$ if the similarity between users $u_i$ and $u_j$ is greater than some threshold $th$.

Here is the distribution of my users' similarity:

> summary(sim)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
 0.0000  0.0310  0.0910  0.1093  0.1710  0.4620 
  • What would be an optimum way to find an appropriate threshold?
  • Is it even the right approach?
chl
  • 50,972
  • 18
  • 205
  • 364
user1848018
  • 745
  • 1
  • 7
  • 10
  • 1
    You could retain the similarities as weights: http://stats.stackexchange.com/questions/2948/how-to-do-community-detection-in-a-weighted-social-network-graph – MattBagg Jan 02 '13 at 21:14

0 Answers0