How would I determine a correlation between sets of integers such as the following:
set A: 1, 2, 3, 4
set B: 2, 3, 4
set C: 4, 5
set D: 2, 5
I want to have a procedure that will let me compute things like "if a set contains 2, there's a 75% chance it also contains 4", and to do that for all pairs of numbers that exist in the sets to get a correlation matrix.
What I really want to see is, if I have a large number of these sets, are there groups of these sets that are highly similar to each other and somewhat dissimilar to other groups. The labels of the sets themselves (A, B, C, D) is arbitrary and unimportant. Only the contents of the sets are significant.
I could write some code to compute these correlations pretty easily, but I am wondering if there is some more sophisticated techniques for getting this kind of information out of it, but I don't know what to call this sort of correlation so it's difficult to google. Any suggestions?