Let's say I have two strings:
string A: 'I went to the cafeteria and bought a sandwich.'
string B: 'I heard the cafeteria is serving roast-beef sandwiches today'.
Formulas:
Levenshtein distance: Minimum number of insertions, deletions or substitutions necessary to convert string a into string b
N-gram distance: sum of absolute differences of occurrences of n-gram vectors between two strings. As an example, the first 3 elements of the bi-gram vectors for strings A and B would be (1, 1, 1) and (0, 0, 0), respectively.
Cosine similarity: $\frac{\underset{a}{\rightarrow} \cdot \underset{b}{\rightarrow}}{\sqrt{\underset{a}{\rightarrow} \cdot \underset{a}{\rightarrow}} \cdot \sqrt{\underset{b}{\rightarrow} \cdot \underset{b}{\rightarrow}}} $
Jaccard similarity: $\frac{length(\underset{a}{\rightarrow} \bigcap \underset{b}{\rightarrow})}{length(\underset{a}{\rightarrow} \bigcup \underset{b}{\rightarrow})}$
Metrics on word granularity in the examples sentences:
Levenshtein distance = 7 (if you consider sandwich and sandwiches as a different word)
Bigram distance = 14
Cosine similarity = 0.33
Jaccard similarity = 0.2
I would like to understand the pros and cons of using each of the these (dis)similarity measures. If possible, it would be nice to understand these pros/cons in the example sentence, but if you have an example that better illustrates the differences, please let me know. Also, I realize that I can scale Levenshtein distance by the number of words in the text, but that wouldn't work for the bigram distance, since it would be greater than 1.
To start, it seems that cosine and Jaccard provide similar results. Jaccard is actually much less computationally intensive and is also (a little bit) easier to explain to a layman.