Intuition behind pearson correlation, co-variance and cosine similarity

Question

In this post, the best answer gives excellent mathmetical explanation among pearson correlation, co-variance and cosine similarity. Where I quote here ($\mathbf A $ is the data matrix).

If you center columns (variables) of $\bf A$, then $\bf A'A$ is the scatter (or co-scatter, if to be rigorous) matrix and $\mathbf {A'A}/(n-1)$ is the covariance matrix.

If you z-standardize columns of $\bf A$ (subtract the column mean and divide by the standard deviation), then $\mathbf {A'A}/(n-1)$ is the Pearson correlation matrix: correlation is covariance for standardized variables. The correlation is also called coefficient of linearity.

If you unit-scale columns of $\bf A$ (bring their SS, sum-of-squares, to 1), then $\bf A'A$ is the cosine similarity matrix. Cosine is also called coefficient of proportionality.

In addition to math explanation, is there any intuitive plot such as pearson correlation in Wikipedia (shown below) to show the relationship between these three "similarity measures", i.e., what kind of shape each similarity metric is able to detect?

The only way these differ substantially is that $A^\prime A$ includes information about magnitudes of the variables. If you don't include that as part of your notion of "similarity," then all three give *identical* information. It therefore sounds like your question has already been answered (unless you have some specific sense of "similarity" in mind). — whuber, Jun 09 '16 at 16:42
@whuber Thank you for reply. Are you saying they are identical and the only difference is they incorporate the magnitude differently? Is there any similar visualizations like the picture in Wikipedia for covariance and cosine similarity? — Haitao Du, Jun 09 '16 at 16:55
Such pictures would look identical to the one you re-posted, except possibly stretched or compressed vertically In fact, it is obvious that in the middle row the $x$ and $y$ variables have different spreads, indicating they are depicting the first case anyway. — whuber, Jun 09 '16 at 17:43
All three coefficients are about equal useless in measuring associations of the data shown in the last row. — ttnphns, Jun 09 '16 at 18:27
thanks @ttnphns. So for detecting associations in non-linear, which one is better, [mutual information](https://en.wikipedia.org/wiki/Mutual_information)? — Haitao Du, Jun 09 '16 at 18:34

score 12 · Accepted Answer · answered Jun 09 '16 at 19:27

We can ignore the matrix formulation, and just consider two vectors $x$ and $y$ (since the matrix formulation is just the vector operation repeated over different pairs of vectors). One intuitive/geometric distinction between covariance/correlation/cosine similarity is their invariance to different transformations of the input. That is, if we transform $x$ and $y$, under what types of transformations will the scores keep the same value?

Covariance subtracts the means before taking the dot product. Therefore, it's invariant to shifts.

Pearson correlation subtracts the means and divides by the standard deviations before taking the dot product. Therefore, it's invariant to shifts and scaling.

Cosine similarity divides by the norms before taking the dot product. Therefore it's invariant to scaling, but not shifts. Geometrically, it can be thought of as measuring the size of the angle between the two vectors (as its name suggests, it's the cosine of the angle).

All of these quantities depend on the dot product, so they can only detect linear structure. To address a question from the comments, mutual information is fully general, and can detect structure for any distribution. But, it's harder to estimate from finite data than other quantities, and more care must be taken. Also, it measures dependence, but doesn't indicate the direction of a relationship (e.g. variables that are correlated or anticorrelated can have the same same mutual information). Mutual information is a valid measure of dependence when no 'direction of relationship' even exists (non-monotonic relationships). If the goal is to detect relationships that are nonlinear but monotonic, then Spearman rank correlation and Kendall's tau are good options.

Intuition behind pearson correlation, co-variance and cosine similarity

1 Answers1

Linked

Related