Test of independence vs test of homogeneity

Question

I am teaching a basic statistics course and today I will cover the chi-squared test of independence for two categories and the test for homogeneity. These two scenarios are conceptually different, but can use the same test statistic and distribution. In a test of homogeneity, marginal totals for one of the categories are assumed to be part of the design itself -- they represent the number of subjects selected for each experimental group. But since the chi-squared test revolves around conditioning on all marginal totals, there are no mathematical consequences to distinguishing between tests of homogeneity and tests of independence with categorical data -- at least none when this test is used.

My question is the following: is there any school of statistical thought or statistical approach that would yield different analyses, depending on whether we are testing for independence (where all marginals are random variables) or a test of homogeneity (where one set of marginals are set by the design)?

In the continuous case, say where we observe $(X,Y)$ on the same subject, and test for independence, or observe $(X_1, X_2)$ in different populations and test if they come from the same distribution, the method is different (correlation analysis vs t-test). What if the categorical data came from discretized continuous variables? Should the tests of independence and homogeneity be indistinguishable?

Can you provide a source which distinguishes "test of homogeneity" and "test of independence"? I've used to think that it is the same (and [Wikipedia](http://en.wikipedia.org/wiki/Pearson%27s_chi-squared_test#Test_of_independence) too). It is also called the chi-square _test of association_ for 2-way contigency table or the _K-independent samples_ chi-square _comparison_ test. It should be not confused with _one-sample_ chi-square test also known as chi-square _test of agreement_. In it, we test the observed frequencies against the theoretical expected ones which we supply. — ttnphns, Nov 25 '13 at 16:20
@ttnphns It seems to be endemic. I'm using "Expect the Unexpected" by Raluca Balan and Gilles Lamothe. Last year I taught from Business Statistics by Sharpe, De Veaux, et al. Both texts make quite a meal of the distinction. In both cases, we have a 2-way contingency table. Needless to say, neither textbook thinks it worthwhile teaching an effect size for the contingency table: another case where subtlety triumphs over usefulness in basic stats courses. — Placidia, Nov 25 '13 at 16:25
The difference should show up if you tried to get a confidence interval for the effect size. — Ray Koopman, Nov 25 '13 at 16:39
That sounds intriguing. Do you mind adding some specifics and making it an answer? — Placidia, Nov 25 '13 at 16:41
That was just a top-of-the-head reaction. I don't have specifics at the moment. — Ray Koopman, Nov 25 '13 at 16:44
It depends if you want to torture the students by the distinction of conditional/unconditional margins. If not you might just focus on explaining that "independence of two categorical variables" is equivalent to "homogeneity of conditional distributions" and then present the single $\chi^2$-test. (I usually present it along with lower confidence limits for the true Cramer's $V$ that measures the strength of association.) — Michael M, Nov 25 '13 at 18:02
Good comments from @Glen_B over [here](http://stats.stackexchange.com/questions/101181/comparing-proportions-in-two-samples?noredirect=1#comment197026_101181)! Maybe we should bug him to answer :) — Nick Stauner, Jun 05 '14 at 23:06

AdamO · Answer 1 · 2017-06-20T14:45:17.770

You simply have to ask yourself, "How do I write the null hypothesis?". Consider a $2 \times k$ contingency table of frequencies of some behavior (y/n) among a number of $k$ groups. Treating the 1st group as referent, you have $k-1$ odds ratios ($\theta_i, i = 1, 2, \ldots, k-1$) that describe the association between frequency and group.

Under independence as with homogeneity, you assume that all odds-ratios are 1. That is, the likelihood of responding "yes" to the condition is equally likely irrespective of group assignment. If those assumptions fail, at least one group is different.

$\mathcal{H}_0(\mbox{homogeneity}): \sum_{i=1}^{k-1} |\theta_i| = 0$

$\mathcal{H}_0(\mbox{independence}): \sum_{i=1}^{k-1} |\theta_i| = 0$

And this test can be conducted with the Pearson Chi-square test using observed/expected frequencies, which is the score test for the logistic regression model adjusting for $k-1$ indicator variables for group membership. So structurally we may say that these tests are the same.

However, differences arise when we consider the nature of the grouping factor. In this sense, the contextual application of the test, or rather its name, is important. A group may be directly causal of an outcome, like the presence or absence of a gene or allele patterns of a trait in which case, when we reject the null we conclude that the outcome depends on the grouping factor in question.

On the other hand, when we test homogeneity, we exonerate ourselves of making any causal assumptions. Thus, when the "group" is a sophisticated construct like race (which causes and is caused by genetic, behavioral, and socioeconomic determinants) we can make conclusions like "racial-ethnic minorities experience housing disparities as evidenced by heterogeneity in neighborhood deprivation index". If someone countered such an argument by saying, "well that's because minorities achieve lower education, earn lower income, and gain less employment" you could say, "I didn't claim that their race caused these things, just simply that if you look at one's race, you can make predictions about their living condition."

In that way, tests of dependence are a special case of tests of homogeneity where the possible effect of lurking factors is of interest and should be handled in a stratified analysis. Using multivariate adjustment in the analogous logistic regression model achieves such a thing, and we may still say we are conducting a test of dependence, but not necessarily homogeneity.

score 3 · Answer 2 · edited Feb 26 '14 at 16:50

3

There is a clear difference between the two problems if you model them in the Bayesian way. In some papers the first case (homogeneity) is called sampling with "one margin fixed" and the second case (independence) as "total table fixed". Have a look, for example, at Casella et al. (JASA 2009).
I am working on this topic but my paper - which also describe this distinction - is not out yet :)

edited Feb 26 '14 at 16:50

Andre Silva

3,070
5
28
55

answered Feb 26 '14 at 16:27

Emanuele

39
2

2

There's a clear difference from a frequentist perspective too - it's just that asymptotically it doesn't matter, & arguments are often made for conditioning on one or both margins in any case. – Scortchi - Reinstate Monica Feb 26 '14 at 17:00

Test of independence vs test of homogeneity

2 Answers2

Linked