What are distances between variables making a covariance matrix?

Question

I have a $n \times n$ covariance matrix and want to partition variables into $k$ clusters using hierarchical clustering (for example, to sort a covariance matrix).

Is there a typical distance function between variables (i.e. between columns/rows of the square covariance matrix)?

Or if there are more, is there a good reference on the topic?

Why would you want to use hierarchical clustering on variables? Generally, we think of a data matrix $X$, w/ variables in columns & observations in rows. If you want to look for latent groupings, you can try, eg, hierarchical clustering on *rows* / observations, or, eg, [factor analysis](http://en.wikipedia.org/wiki/Factor_analysis) on *columns* / variables. — gung - Reinstate Monica, Jun 23 '13 at 00:24
@Piotr, Yes, covariance (or correlation or cosine) can be easily and naturally converted into euclidean distance, because it is a [scalar product](http://stats.stackexchange.com/a/36158/3277) (= angular-type similarity). Knowing covariance between two variables as well as their variances automatically implies knowing _d_ between the variables: $d^2= \sigma_1^2+\sigma_2^2-2cov$. — ttnphns, Jun 23 '13 at 06:59
Note this formula means a negative covariance is greater distance than positive covariance (and this is indeed the case from the geometrical point of view). If you don't want the sign of the covariance to play role, abolish negative sign. — ttnphns, Jun 23 '13 at 07:10
@gung It is a symmetric matrix, so rows~columns. For me it is crucial to divide it into sets of variables, not to 'rotate' them with factor analysis (actually, I am not working with a standard cov. matrix, but a complex one (density matrix in quantum mechanics)). — Piotr Migdal, Jun 23 '13 at 08:21
@ttnphns Thanks. The thing that bothers me is that I want to separate uncorrelated variables - negative correlation is for me (almost) as good as the positive one. — Piotr Migdal, Jun 23 '13 at 08:24
I wonder what makes you to seek replacing covariance by a distance (and what property you want from the distance)? Some (not all) hierarchical clustering methods are proper to use with _cov_ or _abs(cov)_ as with similarity, directly. — ttnphns, Jun 23 '13 at 08:34
@ttnphns As in a comment to Jorge Banuelos' answer - I have not problem with coming with distances (or clustering). However, as I am interested into mathematical properties, interpretations and implications of a given distance (to see which of them can be related to quantum mechanics in a meaningful way), "patching by hand" by using `abs` does not look as the most promising way. But anyway, thanks: $d(i,j)=\sqrt{\sigma_i^2 + \sigma_j^2-2\text{cov}_{ij}}$ can be argued to be the natural/typical distance function for cov. matrix (actually, you should post it as an answer). — Piotr Migdal, Jun 23 '13 at 09:04

score 15 · Accepted Answer · edited Apr 13 '17 at 12:44

Covariance (or correlation or cosine) can be easily and naturally converted into euclidean distance by means of the law of cosines, because it is a scalar product (= angular-based similarity) in euclidean space. Knowing covariance between two variables i and j as well as their variances automatically implies knowing d between the variables: $d_{ij}^2 = \sigma_i^2 + \sigma_j^2 −2cov_{ij}$. (That $d_{ij}^2$ is directly proportional to the usual squared Euclidean distance: you obtain the latter if you use the sums-of-squares and the sum-of-crossproducts in place of the variances and the covariance. Both variables should be of course centered initially: speaking of "covariances" is alias to thinking about data with removed means.)

Note, this formula means that a negative covariance is greater distance than positive covariance (and this is indeed the case from the geometrical point of view, i.e. when the variables are seen as vectors in the subject space ). If you don't want the sign of the covariance to play role, abolish negative sign. Ignoring negative sign isn't "patching by hand" operation and is warranted, when needed: if cov matrix is positive definite, abs(cov) matrix will be positive definite too; and hence the distances obtained by the above formula will be true euclidean distances (euclidean distance is a particular sort of metric distance).

Euclidean distances are universal in respect to hierarchical clustering: any method of such clustering is valid with either euclidean or squared euclidean d. But some methods, e.g. average linkage or complete linkage, can be used with any dissimilarity or similarity (not just metric distances). So you could use such methods directly with cov or abs(cov) matrix or - just for example - with max(abs(cov))-abs(cov) distance matrix. Of course, clustering results do potentially depend on the exact nature of the (dis)similarity used.

How do you define $d^2_{ij}$? I found that this equals to the expectation value of the squared distance between two stochastic variables if both variables have the same mean, but not if they have different mean (then $d^2_{ij}$ will be smaller). — HelloGoodbye, Oct 13 '16 at 12:46
@HelloGoodbye, yes I imply two variables (vectors) with equal means - actually, with means removed, in the first instance. — ttnphns, Oct 13 '16 at 16:41

score 3 · Answer 2 · answered Jun 22 '13 at 21:45

3

Why not use the correlation matrix to do the clustering? Assuming your random variables are centered, by calculating the correlation between variables you are calculating the cosine similarity distance. This distance is also mentioned in your link. This distance can be used for hierarchical clustering. The smaller 1 - |cosine similarity|, the more similar your variables are.

answered Jun 22 '13 at 21:45

Jorge Banuelos

316
1
5

And their properties? I have no problem with coming with some distances (e.g. $d(i,j)=1-A_{ij}^2/(A_{ii}A_{jj})$, or one effectively the same as the cosine dist., or some related to projections on eigenvectors). Just I want to do it in an educated way tailored for the covariance matrix. – Piotr Migdal Jun 22 '13 at 23:16
3

Ah, sorry for the misunderstanding. The best source I know of is [this](http://research.stowers-institute.org/efg/R/Visualization/cor-cluster/). They study the quality of several metrics (that use correlation) with hierarchical clustering. For hierarchical clustering I normally try many metrics and see which works best for my particular goal and data. – Jorge Banuelos Jun 23 '13 at 01:38
the link does not seem to work anymore? – Matifou Jan 24 '20 at 18:49

What are distances between variables making a covariance matrix?

2 Answers2

Linked