0

i have of set of document with this correlation (based on similarity)

This is the set:

1: 2,3,4
2: 1,4
3: 1
4: 1,2

Now i start creating a matrix

0 1 1 1
1 0 0 1
1 0 0 0
1 1 0 0

Now i apply MCL

The result cluster is:

[0] = 1,2,3,4

but the result that I expected was

[0] = 1,2,4
[1] = 3

Any ideas? Is Markov clustering the right choose for this type of stats problem?

Thank you for your help

Stefano Vet
  • 131
  • 1

1 Answers1

1

This slide show is a pretty good intro to MCL https://www.cs.ucsb.edu/~xyan/classes/CS595D-2009winter/MCL_Presentation2.pdf.

One problem that jumps out is that in order for the matrix to define a Markov model the columns must be normalized to sum to one. I don't know if the implementation you're using does this for you, but based on the algorithm if it doesn't, you would not get good results.

Even so, with only four nodes, it seems possible that you would receive one cluster from correctly implement MCL on a connected graph.

jlimahaverford
  • 3,535
  • 9
  • 23
  • i'm using this implementation (http://micans.org/mcl/) ... I had already read, but i don't understand what is mean "the columns must be normalized to sum to one ", how can i do it? Thank you so much!! – Stefano Vet Sep 12 '15 at 23:00
  • For example divide the first column by 3. Each column represents probabilities of moving from a given mode to each of the other nodes. So these probabilities must sum to 1. – jlimahaverford Sep 12 '15 at 23:02
  • i have just tried with sum of probability ... my implementation of matrix was [1]: 0,0.33,0.33,0.33 , [2]: 0.5,0,0,0.5 , [3]: 1,0,0,0 , [4]: 0.5,0.5,0,0 but the result is the same. Now I do not understand whether it is the correct algorithm – Stefano Vet Sep 12 '15 at 23:07
  • You normalized the rows. You need to normalize the columns. – jlimahaverford Sep 12 '15 at 23:09
  • ops i try again [1]: 0,0.5,1,0.5 - [2]: 0.33,0,0,0.5 - [3]: 0.33,0,0,0 - [4]: 0.33,0.5,0,0 - but same result :-( .... – Stefano Vet Sep 12 '15 at 23:15
  • One other option is to add "self loops". Add a 1 to each entry on the diagonal before you normalize the columns. So the fourth column should end up with all 1/4. – jlimahaverford Sep 12 '15 at 23:24
  • nothing, new matrix is : [1]: 1,0.5,1,0.5 - [2]: 0.33,1,0,0.5 - [3]: 0.33,0,1,0 - [4]: 0.33,0.5,0,1. Probably MCL doesn't work with 3 or 4 nodes – Stefano Vet Sep 12 '15 at 23:34
  • i tried with [1]: 0.25,0.5,1,0.5 - [2]: 0.33,0.25,0,0.5 - [3]: 0.33,0,0.25,0 - [4]: 0.33,0.5,0,0.25 too – Stefano Vet Sep 12 '15 at 23:37
  • Ahhh I meant to say the first column should all equal 1/4. Like I said, add one to the diagonal of the original matrix *then* normalize the columns. – jlimahaverford Sep 12 '15 at 23:40
  • Let us [continue this discussion in chat](http://chat.stackexchange.com/rooms/29091/discussion-between-stefano-vet-and-jlimahaverford). – Stefano Vet Sep 12 '15 at 23:43