6

I have a similarity matrix $A \in \mathbb{R}^{N\times N}$ and $a_{ij}\ge 0$ and $A$ is also symmetric.

I want to normalize this matrix in order to use it for graph-based clustering, so that each $1 \ge \hat{a}_{ij}\ge 0$. Ideally i like to use this new matrix as a kernel too. How should i normalize it?

Bob
  • 419
  • 2
  • 10
  • 1
    What is the distribution of the similarity scores? If there aren't too many extreme values, you may be able to just [max scale](http://stats.stackexchange.com/questions/70801/how-to-normalize-data-to-0-1-range) it. If there is a long tail, logging may help as well. Not sure about the later part of your question (using transformed matrix as a kernel). – Keith Hughitt Oct 20 '16 at 13:30
  • Also it is good to consider whether you really need every value to be in the range 0 to 1. Is it not enough for the variance in indices to be 1? Also, I would look into dividing each entry by the determinant of the matrix. – dimpol Oct 20 '16 at 13:38

1 Answers1

5

Assuming it's composed solely of positive values, and if your diagonal isn't already composed solely of ones, do:

$$A_{ij}:=\frac{A_{ij}}{\sqrt{A_{jj}\cdot A_{ii}}}$$

This is analogous to the transformation from a covariance to correlation matrix, i.e. diagonals become one, off-diagonal is rescaled.

Firebug
  • 15,262
  • 5
  • 60
  • 127