Suppose I have a data set of $n$ students, and each student $i$ has two distinct features:
- Which school they go to, $S_i$
- Which sport they play, $P_i$
No student can play more than one sport or go to more than one school. There are $N$ schools and $M$ sports that can be played. Given this, we can define a list of students $L$ like
$L=[(S_1, P_1), (S_2, P_2), ..., (S_n, P_n)]$.
Now, I want to know the answer to the question:
Which (School, Sport) pairings occur more often than would be expected by chance?
I know that if I just want to know "Are (School, Sport) pairings random?" I could use something like an NxM Fisher Exact test...BUT, I want to know specific pairings.
The obvious solution is $NxM$ 1vAll pairings, but this seems like its going to kill any signal. I'm wondering if there's a better (rigorous) approach.