Questions tagged [cosine-similarity]

An angular-type similarity coefficient between two vectors. It is like correlation, only without centering the vectors.

121 questions
35
votes
2 answers

Is cosine similarity identical to l2-normalized euclidean distance?

Identical meaning, that it will produce identical results for a similarity ranking between a vector u and a set of vectors V. I have a vector space model which has distance measure (euclidean distance, cosine similarity) and normalization technique…
31
votes
4 answers

Interpreting negative cosine similarity

My question may be a silly one. So I shall apologize in advance. I was trying to use the GLOVE model pre-trained by Stanford NLP group (link). However, I noticed that my similarity results showed some negative numbers. That immediately prompted me…
Patrick the Cat
  • 546
  • 1
  • 5
  • 15
29
votes
1 answer

Is there any relationship among cosine similarity, pearson correlation, and z-score?

I'm wondering if there is any relationship among these 3 measures. I can't seem to make a connection among them by referring to the definitions (possibly because I am new to these definitions and am having a bit of a rough time grasping them). I…
Jud
  • 443
  • 1
  • 5
  • 12
25
votes
5 answers

Compute a cosine dissimilarity matrix in R

I want to create heatmaps based upon cosine dissimilarity. I'm using R and have explored several packages, but cannot find a function to generate a standard cosine dissimilarity matrix. The built-in dist() function doesn't support cosine distances,…
17
votes
3 answers

Curse of dimensionality- does cosine similarity work better and if so, why?

When working with high dimensional data, it is almost useless to compare data points using euclidean distance - this is the curse of dimensionality. However, I have read that using different distance metrics, such as a cosine similarity, performs…
PyRsquared
  • 1,084
  • 2
  • 9
  • 20
14
votes
1 answer

Automatic keyword extraction: using cosine similarities as features

I've got a document-term matrix $M$, and now I would like to extract keywords for each documents with a supervised learning method (SVM, Naive Bayes, ...). In this model, I already use Tf-idf, Pos tag, ... But now I'm wondering about nexts. I've got…
11
votes
1 answer

Is feature normalisation needed prior to computing cosine distance?

I have a dataset of equal length feature vectors, where each vector contains around 20 features extracted from an audio file (fundamental frequency, BPM, ratios of high to low frequencies etc). I am currently using cosine Similarity to measure the…
11
votes
1 answer

Word embedding algorithms in terms of performance

I'm trying to embed roughly 60 million phrases into a vector space, then calculate the cosine similarity between them. I've been using sklearn's CountVectorizer with a custom built tokenizer function that produces unigrams and bigrams. Turns out…
10
votes
3 answers

K-means on cosine similarities vs. Euclidean distance (LSA)

I am using latent semantic analysis to represent a corpus of documents in lower dimensional space. I want to cluster these documents into two groups using k-means. Several years ago, I did this using Python's gensim and writing my own k-means…
9
votes
2 answers

How does cosine similarity change after a linear transformation?

Is there a mathematical relationship between: the cosine similarity $\operatorname{sim}(A, B)$ of two vectors $A$ and $B$, and the cosine similarity $\operatorname{sim}(MA, MB)$ of $A$ and $B$, non-uniformly scaled via a given matrix $M$? Here $M$…
turdus-merula
  • 1,371
  • 14
  • 20
8
votes
4 answers

Is cosine similarity a classification or a clustering technique?

In document classification, is cosine similarity considered a classification or a clustering technique? But you need training data with the cosine similarity for creation of the centroid right?
8
votes
1 answer

Intuition behind pearson correlation, co-variance and cosine similarity

In this post, the best answer gives excellent mathmetical explanation among pearson correlation, co-variance and cosine similarity. Where I quote here ($\mathbf A $ is the data matrix). If you center columns (variables) of $\bf A$, then $\bf A'A$…
7
votes
1 answer

Proving that cosine distance function defined by cosine similarity between two unit vectors does not satisfy triangle inequality

How to prove that the cosine distance function defined by cosine similarity between two unit vectors does not satisfy the triangle inequality?
Mary
  • 71
  • 1
  • 3
7
votes
4 answers

k-means cluster, How to re-calculate centroid when using cosine similarity?

I have a requirement using k-means cluster method with cosine similarity instead of Euclidean distance. for example: data a: a1 a2 a3 a4 ... data b: b1 b2 b3 b4 ... cosine similarity: $\displaystyle \frac{\mathbf{a}\cdot\mathbf{b}}{…
vvilp
  • 171
  • 1
  • 1
  • 4
6
votes
1 answer

Similarity metrics for more than two vectors?

I am aware of Cosine Similarity which measure the angle between "two" vectors. Prototype for cosine similarity would look something like this: float cosine_similarity(vector a, vector b); Are there any similarity measures that measure…
1
2 3
8 9