In order to cluster users given a user-item binary matrix data, I am planning to first find user's similarity (Jaccard) and then use graph theory to isolate clusters (communities). I need to map the similarity matrix to a binary graph where $e_{ij}=1\; \text{iff}\; \text{sim}(u_i,u_j) > th$, i.e, there is an edge between nodes $i,j$ if the similarity between users $u_i$ and $u_j$ is greater than some threshold $th$.
Here is the distribution of my users' similarity:
> summary(sim)
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.0000 0.0310 0.0910 0.1093 0.1710 0.4620
- What would be an optimum way to find an appropriate threshold?
- Is it even the right approach?