Questions tagged [cosine-distance]

A measure of the angular distance between two vectors. Usually defined as 1-(cosine similarity).

A distance measure of the angle between two vectors $\mathbf u$ and $\mathbf v$,

$$1 - \text{cos}(\mathbf u, \mathbf v)$$

where

$$\text{cos}(\mathbf u, \mathbf v) = \cfrac{\langle \mathbf u, \mathbf v \rangle}{\| \mathbf u \| \| \mathbf v \|} = \left\langle \cfrac{\mathbf u}{\| \mathbf u\|}, \cfrac{\mathbf v}{\| \mathbf v\|}\right\rangle,$$

or "cosine similarity," is the cosine of the angle between $\mathbf u$ and $\mathbf v$.

The cosine similarity and cosine distance functions are often used in Information Retrieval and Text Mining for document ranking and clustering.

See also

40 questions
35
votes
2 answers

Is cosine similarity identical to l2-normalized euclidean distance?

Identical meaning, that it will produce identical results for a similarity ranking between a vector u and a set of vectors V. I have a vector space model which has distance measure (euclidean distance, cosine similarity) and normalization technique…
25
votes
5 answers

Compute a cosine dissimilarity matrix in R

I want to create heatmaps based upon cosine dissimilarity. I'm using R and have explored several packages, but cannot find a function to generate a standard cosine dissimilarity matrix. The built-in dist() function doesn't support cosine distances,…
15
votes
1 answer

Cosine Distance as Similarity Measure in KMeans

I am currently solving a problem where I have to use Cosine distance as the similarity measure for k-means clustering. However, the standard k-means clustering package (from Sklearn package) uses Euclidean distance as standard, and does not allow…
MSalty
  • 255
  • 1
  • 2
  • 5
14
votes
1 answer

Automatic keyword extraction: using cosine similarities as features

I've got a document-term matrix $M$, and now I would like to extract keywords for each documents with a supervised learning method (SVM, Naive Bayes, ...). In this model, I already use Tf-idf, Pos tag, ... But now I'm wondering about nexts. I've got…
11
votes
1 answer

Is feature normalisation needed prior to computing cosine distance?

I have a dataset of equal length feature vectors, where each vector contains around 20 features extracted from an audio file (fundamental frequency, BPM, ratios of high to low frequencies etc). I am currently using cosine Similarity to measure the…
10
votes
3 answers

K-means on cosine similarities vs. Euclidean distance (LSA)

I am using latent semantic analysis to represent a corpus of documents in lower dimensional space. I want to cluster these documents into two groups using k-means. Several years ago, I did this using Python's gensim and writing my own k-means…
7
votes
1 answer

Proving that cosine distance function defined by cosine similarity between two unit vectors does not satisfy triangle inequality

How to prove that the cosine distance function defined by cosine similarity between two unit vectors does not satisfy the triangle inequality?
Mary
  • 71
  • 1
  • 3
7
votes
4 answers

k-means cluster, How to re-calculate centroid when using cosine similarity?

I have a requirement using k-means cluster method with cosine similarity instead of Euclidean distance. for example: data a: a1 a2 a3 a4 ... data b: b1 b2 b3 b4 ... cosine similarity: $\displaystyle \frac{\mathbf{a}\cdot\mathbf{b}}{…
vvilp
  • 171
  • 1
  • 1
  • 4
6
votes
2 answers

TF-IDF versus Cosine Similarity in Document Search

I'm wondering if anyone can help me out or point out some resources to learn more about TF-IDF and document search. I'm trying to implement a basic document search and am trying to better understand the differences and trade offs for my approach. My…
5
votes
0 answers

What is the benefit of picking a distance which is a metric?

A popular distance measure, cosine similarity/distance, is not a proper metric because it fails to satisfy one of the conditions (the triangle inequality). However, there is no disadvantage whatsoever in using it and it is used heavily in many…
mitbal
  • 171
  • 1
  • 5
2
votes
1 answer

Cosine similarity for Categorical datasets?

Can I use Cosine similarity measure for estimating similarity/relationship between D1 and D2 (two categorical datasets)
2
votes
0 answers

Are there advanced "cos similarity" that influence of dim size is less?

I found that the cosine similarity is affected to the effect of "Curse of dimension" by trying the following simulation. create(select) two vectors form uniform random numbers U[-1, 1], each dim = 2, 3, 4, 5, 6, 7, 8, 9, 10, 100. calculate cosine…
cartman
  • 53
  • 6
2
votes
3 answers

Cosine Similarity Intuition

I understand what cosine similarity is and how to calculate it, specifically in the context of text mining (i.e. comparing tf-idf document vectors to find similar documents). What I'm looking for is some better intuition for interpreting the…
2
votes
0 answers

Supervised cosine similarity

Suppose we have some samples, each sample is with two vectors and the corresponding label. That is, it looks like ($\mathbf{u}_i, \mathbf{v}_i, y_i$), where $y_i \in \{0, 1\}$ We can calculate the cosine similarity between the two vectors…
2
votes
5 answers

How to find the similarity between movie preferences (in the form of a probability vector)of two users?

I am working on recommender systems, and using some methodology I have got a probability of each user liking a movie. To elaborate, say user $u_1$ has the following distribution for movie preferences over $8$ movies: m1 m2 m3 m4 m5 m6…
1
2 3