Questions tagged [jaccard-similarity]

Jaccard similarity (or jaccard coefficient) is a similarity function for computing the similarity between two sets

Jaccard similarity (or jaccard coefficient) is a similarity function for computing the similarity between two sets

The jaccard coefficient between two sets $A$ and $B$ is defined as

$$\text{jaccard}(A, B) = \cfrac{| \ A \cap B \ | }{| \ A \cup B \ |}$$

I.e. it's the ratio between the number of elements that $A$ and $B$ share to the total number of elements in both $A$ and $B$

See also

76 questions
17
votes
1 answer

What are the difference between Dice, Jaccard, and overlap coefficients?

I come across three different statistical measures to compare two sets, in particular to segmentation on images (e.g., comparing the similarity between the ground truth and the segmented result). What are the differences between these measurements…
8
votes
5 answers

Jaccard Similarity - From Data Mining book - Homework problem

Exercise 3.1.3 : Suppose we have a universal set U of n elements, and we choose two subsets S and T at random, each with m of the n elements. What is the expected value of the Jaccard similarity of S and T ? I am reading the book…
delete me
  • 91
  • 1
  • 5
7
votes
3 answers

Similarity measures for more than 2 variables

If I have two binary variables, I can determine the similarity of these variables quite easily with different similarity measures, e.g. with the Jaccard similarity measure: $J = \frac{M_{11}}{M_{01} + M_{10} + M_{11}}$ Example in R: # Example data N…
Joachim Schork
  • 1,068
  • 4
  • 15
  • 37
7
votes
2 answers

Accuracy vs Jaccard for multiclass problem

TL;DR For a multiclass problem, is Jaccard score the same as accuracy? Update March 29, 2019 The wrong implementation in scikit-learn is now fixed with pull request #13151. Hooray! P.S. The lesson here is that no matter how mature and widespread…
7
votes
5 answers

Jaccard similarity in R

I want to compare 2 vectors of length 43; they have values of 0 (not present) and 1 (present). I will refer to $M_{1,1}$ as situations in which both 1 are present, and $M_{1,0}$ and $M_{0,1}$ to situations in with only one 1 is present while the…
Torvon
  • 823
  • 4
  • 10
  • 21
5
votes
2 answers

Jaccard similarity coefficient vs. Point-wise mutual information coefficient

Can you explain the difference between the Jaccard similarity coefficient and the pointwise mutual information (PMI) measure? It would be great if you could add a few examples.
5
votes
2 answers

Jaccard index between set and multiset

Can I use Jaccard index to calculate similarity between set and multiset? As I know Jaccard is defines as the size of the intersection divided by the size of the union of the sample sets, that is $J(A, B) = |A \cap B| \, / \, |A \cup B|$ Now if I…
Arwa
  • 151
  • 1
  • 4
4
votes
1 answer

What is the significance of the Jaccard similarity score?

I understand how to calculate the jaccard similarity , but never quite understood the logic behind why are we calculating it. How does it show the similarity between two sets? What relation exactly does it show? Can someone throw some light on…
4
votes
0 answers

Statistical Interpretation of Average Pairwise Similarity

I have assembled binary vectors (0/1 for all elements and equal weight and arranged in time order) that have been separated into different cohorts where a unique event of interest occurs. I have removed the event of interest element itself and the…
Pylander
  • 425
  • 1
  • 4
  • 10
4
votes
1 answer

Similarity between sets with different size

Is there a distance measure like jaccard for sets with different sizes? For example A=['a','b','c'] and B=['a','d'] I would like to include the total intersection as well as the order. The implementation of jaccard similarity score in Pythons…
J-H
  • 177
  • 7
4
votes
1 answer

A similarity measure with binary data: does this one have a name?

There are many binary similarity measures (e.g. Jaccard, Sorensen, etc), each of them is sensitive to different properties of the compared sets. I would like to use the metric $S=\frac{N_{A\bigcap B}}{min(N_{A}; N_{B})}$, where $N_{A}$ is the count…
3
votes
0 answers

A probability distribution model for Jaccard similarity

This is an obfuscated version of a real problem: Each day I speak with some number of (distinct) girls. I compute the Jaccard similarity index between two consecutive days: $$ …
o17t H1H' S'k
  • 511
  • 6
  • 11
3
votes
1 answer

Estimate Jaccard similarity based on a sample

The Jaccard similarity of two sets, $A$ and $B$, is defined as: $Jaccard(A,B)=\frac{A\cap{B}}{A\cup{B}}$. Say that I only have a sample of $P\%$ of each of the sets: $A'$ and $B'$. What would be a good estimator for the Jaccard similarity of the…
etov
  • 265
  • 1
  • 6
3
votes
0 answers

Similarity measures and document length

I have an application where I need to measure the similarity between the (TF-IDF?) representation of two documents: $\mathbf{a}$ and $\mathbf{b}$ while still taking the document length into account. More specifically, if the document $a$ is…
3
votes
2 answers

Significance Test for Jaccard Distance

I am looking for a significance test for the Jaccard Distance (JD). As an example, I have two datasets as follows: Baseline: $\left| A\bigcap B \right|=57;\ \left| A\bigcup B \right|=275\quad \therefore \ JD=0.7927$ Evaluation: $\left| A\bigcap B…
Mari153
  • 385
  • 5
  • 16
1
2 3 4 5 6