I have a cross-correlation matrix, $C_{nm}$, between two sets of variables, and I would like to establish the correspondence between the row and the column variables.
My current approach is to convert the correlation matrix to a connectivity matrix by setting: $$ M_{nm} = \begin{cases} 1, \text{ if } C_{nm} \geq f,\\ 0, \text{ if } C_{nm} < f \end{cases} $$ I then use a home-made algorithm to find all the connected clusters, which fall into the following categories:
- row or column variables not connected to anything
- pairs of row and column variables with one-to-one correspondence
- row/column variable corresponding to several column/row variables
- clusters relating several row and column variables.
I am looking for:
- A standard algorithm for performing such a decomposition. (I encountered Dulmage-Mendelsohn one of the answers to this question, but I am not sure if it does exactly what I need. Overall, this question seems similar to what I would like to do, except that it deals with the auto-correlation)
- A way to choose the cutoff $f$, to optilize the splitting into the categories cited above - i.e., minimizing misclassification due to statistical errors (the correlation matrix originates from averaging over a hundred of samples)
- Perhaps a better approach to analyzing (and visualizing) the relations between the two sets of variables.