I have two lists: A and B where Size(A) = 4 and size(B) = 10. I would like to find the Jaccard similarity between these two lists.
Suppose A = {Tom, George, John, Jennifer} and B={Tom, Jessica, Angel, Hanna, Tom, John, Michele, Edward, Alex, Tom}
As far as I know, Jaccard is measured by = (Intersect A, B)/(Union A,B) and I read here, that jaccard similarity is the number of common attributes divided by the number of attributes that exists in at least one of the two objects that is: p/p+q+r where "p" is number of common attributes, "q" is # of attributes 1 for A and 0 for B while "r" is # of attributes 0 for A and 1 for B.
My question Considering the above formula (if it is correct), should I count the repeated items in list B (that is "Tom" in this example)? so, is the following correct:
Jaccard(A,B) = 2/2+2+6 = 20%
or should I ignore repeated item "Tom", and write like this:
Jaccard(A,B) = 2/2+2+8 = 16.6%
I really get confused, please help.