0

When I normalize a data set and compute the cosine similarity between the rows, the cosine similarity differs from the one without any normalization.

Say there are 4 2D vectors: (1, 1), (2, 2), (1, 2) and (2, 1) Before normalization: cosineSimilarity between (1,1) and (2,2) is 1.0

After normalization these vectors become: (-0.5, -0.5) and (0.5, 0.5) The cosine similarity becomes -1.0

The interpretation changed completely.

Does this mean that when using KNN, Kmeans or any distance based algorithm on a dataset that uses the cosine similarity, normalization should be avoided?

  • 1
    Your sense of "normalize" appears to include *recentering*. Because that shifts these vectors, then *of course* the angles change. Please see http://stats.stackexchange.com/questions/22329. – whuber Mar 29 '17 at 17:23
  • Are you explictly talking about normalization or do you mean scaling? – NeuroMorphing Mar 29 '17 at 23:23
  • Doesn't normalization include a recentering by mean subtraction? The scaling would be invariant to the cosine similarity I believe. – AbhinavChoudhury Mar 31 '17 at 03:03

0 Answers0