How to normalized a similarity matrix?

Question

I have a similarity matrix $A \in \mathbb{R}^{N\times N}$ and $a_{ij}\ge 0$ and $A$ is also symmetric.

I want to normalize this matrix in order to use it for graph-based clustering, so that each $1 \ge \hat{a}_{ij}\ge 0$. Ideally i like to use this new matrix as a kernel too. How should i normalize it?

What is the distribution of the similarity scores? If there aren't too many extreme values, you may be able to just [max scale](http://stats.stackexchange.com/questions/70801/how-to-normalize-data-to-0-1-range) it. If there is a long tail, logging may help as well. Not sure about the later part of your question (using transformed matrix as a kernel). — Keith Hughitt, Oct 20 '16 at 13:30
Also it is good to consider whether you really need every value to be in the range 0 to 1. Is it not enough for the variance in indices to be 1? Also, I would look into dividing each entry by the determinant of the matrix. — dimpol, Oct 20 '16 at 13:38

score 5 · Accepted Answer · answered Oct 20 '16 at 14:38

Assuming it's composed solely of positive values, and if your diagonal isn't already composed solely of ones, do:

$$A_{ij}:=\frac{A_{ij}}{\sqrt{A_{jj}\cdot A_{ii}}}$$

This is analogous to the transformation from a covariance to correlation matrix, i.e. diagonals become one, off-diagonal is rescaled.

How to normalized a similarity matrix?

1 Answers1