2

I have a symmetric data matrix $A$, giving co-occurrence of events. That is, $A_{ij}$ is the frequency of occurrence of $i,j$ together. The diagonal elements of $A$ are unknown/indeterminate.

I am interested in detecting pairs of events $i,j$ that preferentially occur together.

The problem is that some rows/columns of this matrix are significantly larger than other rows/columns. The naive approach of selecting pairs with large $A_{ij}$ then ends up selecting these rows/columns. Is there a way to normalize rows/columns without losing the symmetry of the matrix, so that the fine detail of the matrix becomes visible?

becko
  • 3,298
  • 1
  • 19
  • 36
  • 1
    How exactly are these "events" observed? This matters because it will influence the degree of dependence among the individual data entries. – whuber Dec 15 '14 at 19:24
  • Isn't this an exact duplicate of [your older question](http://stats.stackexchange.com/questions/128703)? – whuber Dec 15 '14 at 19:25
  • @whuber See the [older question](http://stats.stackexchange.com/q/128703/5536) for how the data is generated. It's the same matrix $A = f$ in both questions. Whether both questions are duplicates depends on the answer. Here I am just asking about a way to normalize the data so that the variability is better seen. The other question is about inferring correlations between events. Perhaps both questions can be answered using the same method... but we don't know that yet. – becko Dec 15 '14 at 19:28

1 Answers1

0

If event $i$ is very frequent then $A_{ij}$ will be relaitively large compared to $A_{k\ne i,j}$.

The solution is not to use naive approach. Instead convert your frequency matrix into the correlation matrix like suggested in this answer.

Aksakal
  • 55,939
  • 5
  • 90
  • 176
  • That's exactly the point. A very frequent event $i$ hides the variability of $A_{k\ne i,j}$. I do not understand matlab very well. Can you tell me the idea, just in formulas (I'll program it in my language) of what you're doing in that answer you linked? Also, note that in my data, the diagonal elements of $A$ are not present. – becko Dec 15 '14 at 19:17
  • The answer you refer to assumes the $i$ and $j$ are numeric values, that the pairs are *ordered*, whereas these are *unordered* (that's what the symmetry of the matrix means), and that a Pearson correlation coefficient is needed, which does not seem to be relevant in this question. Thus it seems your solution is inapplicable. – whuber Dec 15 '14 at 19:22
  • 1
    @whuber, you are right. Let me think of the way to apply the same idea to this problem. – Aksakal Dec 15 '14 at 20:41
  • @Aksakal: Did you find a better solution? – kjetil b halvorsen Aug 05 '21 at 00:17
  • @kjetilbhalvorsen, didnt have a chance to think about it – Aksakal Aug 05 '21 at 00:42