Degrees of freedom in Chi-squared test of homogeneity

Question

Assume a contingency table $I\times J$, which consists of counts of multinomial r.v. with $I$ categories from $J$ populations with fixed marginal totals $n_{\cdot j}=\sum^I_{i=1}n_{ij}$ for each population.

We can use the Pearson's Chi-square statistics to test for homogeneity:

$\chi^2=\sum^I\sum^J\dfrac{(O_{ij}-E_{ij})^2}{E_{ij}}=\sum^I\sum^J\dfrac{(n_{ij}-n_{i\cdot}n_{\cdot j}/n_{\cdot\cdot})^2}{n_{i\cdot}n_{\cdot j}/n_{\cdot\cdot}}$

The degrees of freedom should represent the number of independent variates in the model/statics used - in this case that would be Pearson's chi-square statistics.

In J. A. Rice's book, the following explanation is given for determining the dof:

The degrees of freedom are the number of independent counts minus the number of independent parameters estimated from the data. Each multinomial has $I-1$ independent counts, since the totals are fixed, and $I-1$ independent parameters have been estimated. The degrees of freedom are therefore $(I-1)(J-1)$.

However, I don't think that the counts are independent. Since the marginal counts $n_{\cdot j}$ are fixed, and the cell counts are nonnegative and must sum to $n_{\cdot j}$, they are not independent. Clearly, by setting arbitrary $n_{ij}$ (for some $j$) to some value (e.g. $n_{ij}=x$), the rest of the counts $n_{kj}$ for $k\neq i$ will be affected, as the maximum permissible value is now $n_{\cdot j}-x$.

I tried to search for the answer, but unfortunately, majority of the sites simply repeat the formula without any comment.

Where is the error in my reasoning? What am I misunderstanding about dof.

Welcome to CV, Rudimentary Joe. You are asking a great question. Indeed, the chi-squared example was used by the best talent here at CV to illustrate problems with the conventional definition of degrees of freedom. I highly recommend the answer by @whuber: https://stats.stackexchange.com/questions/16921/how-to-understand-degrees-of-freedom . — Peter Leopold, May 13 '19 at 15:39
@PeterLeopold Thank you. However, I remained unconvinced by that answer. His counter-example statistics didn't follow the chi-squared distribution. How can I then expect it to follow "rules" about DF? — Rudimentary Joe, May 18 '19 at 08:58
I owe you a full answer, but the short answer is you 1) you are right that something is wrong, 2) the reason why is rather deep, 3) because "degrees of freedom" are nothing more than a heuristic (a convenient simplification we tell students and each other) for a way to count the size of outcome space in units similar to numbers of input data points, and 4) Bayesians don't use DoF explicitly but often interpret a result as DoF-like. 5) The DoF notion can "bend" to meet necessity, e.g., Welch's unbalanced sample size, heteroskedastic t-test, which employs *fractional* degrees of freedom. — Peter Leopold, May 19 '19 at 21:15

Degrees of freedom in Chi-squared test of homogeneity

0 Answers0