Properties of Levenshtein, N-Gram, cosine and Jaccard distance coefficients - in sentence matching

Question

Let's say I have two strings:

string A: 'I went to the cafeteria and bought a sandwich.'
string B: 'I heard the cafeteria is serving roast-beef sandwiches today'.

Formulas:

Levenshtein distance: Minimum number of insertions, deletions or substitutions necessary to convert string a into string b

N-gram distance: sum of absolute differences of occurrences of n-gram vectors between two strings. As an example, the first 3 elements of the bi-gram vectors for strings A and B would be (1, 1, 1) and (0, 0, 0), respectively.

Cosine similarity: $\frac{\underset{a}{\rightarrow} \cdot \underset{b}{\rightarrow}}{\sqrt{\underset{a}{\rightarrow} \cdot \underset{a}{\rightarrow}} \cdot \sqrt{\underset{b}{\rightarrow} \cdot \underset{b}{\rightarrow}}} $

Jaccard similarity: $\frac{length(\underset{a}{\rightarrow} \bigcap \underset{b}{\rightarrow})}{length(\underset{a}{\rightarrow} \bigcup \underset{b}{\rightarrow})}$

Metrics on word granularity in the examples sentences:

Levenshtein distance = 7 (if you consider sandwich and sandwiches as a different word)
Bigram distance = 14
Cosine similarity = 0.33
Jaccard similarity = 0.2

I would like to understand the pros and cons of using each of the these (dis)similarity measures. If possible, it would be nice to understand these pros/cons in the example sentence, but if you have an example that better illustrates the differences, please let me know. Also, I realize that I can scale Levenshtein distance by the number of words in the text, but that wouldn't work for the bigram distance, since it would be greater than 1.

To start, it seems that cosine and Jaccard provide similar results. Jaccard is actually much less computationally intensive and is also (a little bit) easier to explain to a layman.

A good way to start to understand the differences is to dig up for their formulas, all expressed in a single `a-b-c-d` "binary data form" , such as used in [this answer](http://stats.stackexchange.com/a/55802/3277), for example. — ttnphns, Jun 20 '16 at 18:10
@ttnphns, I understand the differences between the algorithms, just not clear on a situation where for example cosine similarity would be superior to Jaccard similarity. — matsuo_basho, Jun 20 '16 at 18:14
They are not "algorithms". They are alternative proximity measures. — ttnphns, Jun 20 '16 at 18:15
If you now the _formulas_ of them all, why not show them in your question; and then ask "how are their properties different given these formulas? I expect this (something) but I don't understand that (something)". That would make your question specific and showing your efforts. So far, the question is too broad. — ttnphns, Jun 20 '16 at 18:19
Levenshtein is a specific form of "alignment" distance, and it compares sequences of elements, i.e. both content of elements and their order. Cosine and Jaccard compare only content (element is, say, a letter). Bi-gram distance compares content of elements, but an element is defined specifically as 2-letter chunk. — ttnphns, Jun 21 '16 at 09:52

Properties of Levenshtein, N-Gram, cosine and Jaccard distance coefficients - in sentence matching

0 Answers0

Linked