I have a dataset with about 35,000 individuals described by around 15 categorical variables.
I'm trying to study the independence / correlation between these 15 categorical variables. My first idea was to, for each pair of variables, create a contingency table and calculate the $\chi^2$. Then, study the overall difference in the statistic. However, because the population is so large, $\chi^2$ is always significant. I'm having difficulty interpreting and comparing the results for each pair of variables.
So, I can summarize my question as follows:
- For large datasets, when I know $\chi^2$ will almost always be significant, is there an alternative test that will give more reasonable results?
I have two ideas, as well
- I was thinking of taking many bootstrap samples of say 1K individuals. On each sample calculate the correlation, then average over all the bootstrap samples. The average should be a good representation of the overall sample, but I feel like I'm somehow cheating.
- Can I simply compare the magnitudes of the $\chi^2$ test between the different pairs of variables? The degrees of freedom are different (the categories are of different sizes), which leads me to think this won't make sense.