Questions tagged [cosine-distance]

A measure of the angular distance between two vectors. Usually defined as 1-(cosine similarity).

A distance measure of the angle between two vectors $\mathbf u$ and $\mathbf v$,

$$1 - \text{cos}(\mathbf u, \mathbf v)$$

where

$$\text{cos}(\mathbf u, \mathbf v) = \cfrac{\langle \mathbf u, \mathbf v \rangle}{\| \mathbf u \| \| \mathbf v \|} = \left\langle \cfrac{\mathbf u}{\| \mathbf u\|}, \cfrac{\mathbf v}{\| \mathbf v\|}\right\rangle,$$

or "cosine similarity," is the cosine of the angle between $\mathbf u$ and $\mathbf v$.

The cosine similarity and cosine distance functions are often used in Information Retrieval and Text Mining for document ranking and clustering.

See also

40 questions

35

votes

2 answers

Is cosine similarity identical to l2-normalized euclidean distance?

Identical meaning, that it will produce identical results for a similarity ranking between a vector u and a set of vectors V. I have a vector space model which has distance measure (euclidean distance, cosine similarity) and normalization technique…

normalization natural-language euclidean cosine-distance cosine-similarity

asked Apr 13 '15 at 22:58

Arne

453
1
6
9

25

votes

5 answers

Compute a cosine dissimilarity matrix in R

I want to create heatmaps based upon cosine dissimilarity. I'm using R and have explored several packages, but cannot find a function to generate a standard cosine dissimilarity matrix. The built-in dist() function doesn't support cosine distances,…

r clustering similarities cosine-similarity cosine-distance

asked Jul 03 '12 at 12:30

Greg Slodkowicz

405
1
5
10

15

votes

1 answer

Cosine Distance as Similarity Measure in KMeans

I am currently solving a problem where I have to use Cosine distance as the similarity measure for k-means clustering. However, the standard k-means clustering package (from Sklearn package) uses Euclidean distance as standard, and does not allow…

k-means distance euclidean cosine-distance

asked Aug 21 '17 at 12:51

MSalty

255
1
2
5

14

votes

1 answer

Automatic keyword extraction: using cosine similarities as features

I've got a document-term matrix $M$, and now I would like to extract keywords for each documents with a supervised learning method (SVM, Naive Bayes, ...). In this model, I already use Tf-idf, Pos tag, ... But now I'm wondering about nexts. I've got…

text-mining feature-engineering supervised-learning cosine-distance cosine-similarity

asked May 02 '15 at 06:48

Silke

285
2
13

11

votes

1 answer

Is feature normalisation needed prior to computing cosine distance?

I have a dataset of equal length feature vectors, where each vector contains around 20 features extracted from an audio file (fundamental frequency, BPM, ratios of high to low frequencies etc). I am currently using cosine Similarity to measure the…

normalization similarities cosine-similarity cosine-distance

asked Jul 20 '17 at 19:52

j b

213
2
7

10

votes

3 answers

K-means on cosine similarities vs. Euclidean distance (LSA)

I am using latent semantic analysis to represent a corpus of documents in lower dimensional space. I want to cluster these documents into two groups using k-means. Several years ago, I did this using Python's gensim and writing my own k-means…

k-means svd latent-semantic-analysis cosine-distance cosine-similarity

asked Oct 16 '14 at 19:27

Jeff

3,525
5
27
38

7

votes

1 answer

Proving that cosine distance function defined by cosine similarity between two unit vectors does not satisfy triangle inequality

How to prove that the cosine distance function defined by cosine similarity between two unit vectors does not satisfy the triangle inequality?

distance proof metric cosine-distance cosine-similarity

asked Feb 23 '16 at 11:21

Mary

71
1
3

7

votes

4 answers

k-means cluster, How to re-calculate centroid when using cosine similarity?

I have a requirement using k-means cluster method with cosine similarity instead of Euclidean distance. for example: data a: a1 a2 a3 a4 ... data b: b1 b2 b3 b4 ... cosine similarity: $\displaystyle \frac{\mathbf{a}\cdot\mathbf{b}}{…

clustering k-means cosine-distance cosine-similarity

asked Oct 15 '14 at 07:57

vvilp

171
1
1
4

6

votes

2 answers

TF-IDF versus Cosine Similarity in Document Search

I'm wondering if anyone can help me out or point out some resources to learn more about TF-IDF and document search. I'm trying to implement a basic document search and am trying to better understand the differences and trade offs for my approach. My…

machine-learning ranking similarities cosine-distance cosine-similarity

asked Mar 05 '15 at 22:05

Tim S

171
1
1
2

5

votes

0 answers

What is the benefit of picking a distance which is a metric?

A popular distance measure, cosine similarity/distance, is not a proper metric because it fails to satisfy one of the conditions (the triangle inequality). However, there is no disadvantage whatsoever in using it and it is used heavily in many…

metric cosine-distance

asked Feb 08 '17 at 10:23

mitbal

171
1
5

2

votes

1 answer

Cosine similarity for Categorical datasets?

Can I use Cosine similarity measure for estimating similarity/relationship between D1 and D2 (two categorical datasets)

correlation euclidean cosine-similarity cosine-distance

asked Aug 13 '20 at 03:51

Msilvy

41
5

2

votes

0 answers

Are there advanced "cos similarity" that influence of dim size is less?

I found that the cosine similarity is affected to the effect of "Curse of dimension" by trying the following simulation. create(select) two vectors form uniform random numbers U[-1, 1], each dim = 2, 3, 4, 5, 6, 7, 8, 9, 10, 100. calculate cosine…

cosine-similarity cosine-distance

asked May 18 '19 at 22:52

cartman

53
6

2

votes

3 answers

Cosine Similarity Intuition

I understand what cosine similarity is and how to calculate it, specifically in the context of text mining (i.e. comparing tf-idf document vectors to find similar documents). What I'm looking for is some better intuition for interpreting the…

text-mining natural-language cosine-similarity cosine-distance tf-idf

asked Jan 17 '17 at 20:49

ccb

21
2

2

votes

0 answers

Supervised cosine similarity

Suppose we have some samples, each sample is with two vectors and the corresponding label. That is, it looks like ($\mathbf{u}_i, \mathbf{v}_i, y_i$), where $y_i \in \{0, 1\}$ We can calculate the cosine similarity between the two vectors…

machine-learning data-mining supervised-learning cosine-similarity cosine-distance

asked Jan 16 '17 at 02:36

Burning

21
2

2

votes

5 answers

How to find the similarity between movie preferences (in the form of a probability vector)of two users?

I am working on recommender systems, and using some methodology I have got a probability of each user liking a movie. To elaborate, say user $u_1$ has the following distribution for movie preferences over $8$ movies: m1 m2 m3 m4 m5 m6…

recommender-system similarities correspondence-analysis cosine-similarity cosine-distance

asked Jul 19 '16 at 15:13

user3676846

81
2
7

1

2 3 Next