I have a dataset of $n$ $p$-dimensional vectors (objects) that I want to cluster.
One way to do this is to compute the ($n \times n$) correlation matrix $C$, then obtain a dissimilarity matrix, $D$, from $C$ such that each element $d_{ij}$ in $D$ is a function of the single element $c_{ij}$ in $C$ (e.g., $D=1-C$), and then cluster on $D$.
Instead, I want to take $C$ and obtain $D$, such that each element $d_{ij}$ of $D$ is a function of the entire vectors $C_i$ and $C_j$ in $C$ (for example, $d_{ij}$ can be the Euclidean distance between $C_i$ and $C_j$).
Why do 2)?:
- Approach 1) computes the dissimilarity (distance) between two input vectors $i$ and $j$ solely as a function of the similarity between $i$ and $j$; in contrast, approach 2) computes the dissimilarity (distance) between two input vectors $i$ and $j$ as a function of the similarity of the similarities between $i$ and all other vectors vs. $j$ and all other vectors.
- On the problems with which I am working, approach 2) seems to perform better in practice.
What I am wondering is whether there is any reason why approach 2) would not be valid?