Appropriate threshold to map a similarity value to an edge in a graph

Asked Jan 02 '13 at 19:45

Active Jan 02 '13 at 20:36

Viewed 384 times

In order to cluster users given a user-item binary matrix data, I am planning to first find user's similarity (Jaccard) and then use graph theory to isolate clusters (communities). I need to map the similarity matrix to a binary graph where $e_{ij}=1\; \text{iff}\; \text{sim}(u_i,u_j) > th$, i.e, there is an edge between nodes $i,j$ if the similarity between users $u_i$ and $u_j$ is greater than some threshold $th$.

Here is the distribution of my users' similarity:

> summary(sim)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
 0.0000  0.0310  0.0910  0.1093  0.1710  0.4620

What would be an optimum way to find an appropriate threshold?
Is it even the right approach?

edited Jan 02 '13 at 20:36

chl

50,972
18
205
364

asked Jan 02 '13 at 19:45

user1848018

1

You could retain the similarities as weights: http://stats.stackexchange.com/questions/2948/how-to-do-community-detection-in-a-weighted-social-network-graph – MattBagg Jan 02 '13 at 21:14

Appropriate threshold to map a similarity value to an edge in a graph

0 Answers0