0

A often stated rule-of-thumb to calculate the number of degrees of freedom for a chi-square goodness-of-fit test (based on the Pearson chi-square test statistic) is calculated as the number of categories, minus one, minus the number of parameters inferred from the data.

I understand that every time we infer something from the sample that adds uncertainty and you lose a degree of freedom, but what I do not understand is why you would adjust the degrees of freedom downward if the end result is to make the critical value smaller (and thus the test less conservative). I would think that when we add uncertainty, the test should get more conservative, not less conservative. This seems to be the opposite of how inference works on other distributions, such as the t-distribution, where the tails get fatter when there are fewer degrees of freedom (thus taking a larger t-score to reject the null).

  • (1) The determination of the df isn't quite that simple: among other things, the categories have to be established independently of the data and the parameters need to be estimated using maximum likelihood based on the bin counts. Ignoring the subtleties has led to some awfully bad analyses. See https://stats.stackexchange.com/a/17148/919 for the details. (2) Since there's much more going on in testing than identifying the correct distribution of the statistic, you can't infer anything about it being more or less "conservative" solely from the distribution. – whuber Feb 07 '18 at 18:00
  • Thank you for your insight whuber. Let me see if I am understanding this correctly: adjusting the degrees of freedom (in this sense), IF done correctly, would provide a more appropriate approximation of the sampling distribution of the statistic, rather than merely an "adjustment" for uncertainty associated with estimating additional parameters. – coreydevinanderson Feb 08 '18 at 15:05
  • When you say that the categories should be established independently of the data, would that mean that using something like observed genotypes as classes would be incorrect (e.g., to test fit of counts of observed genotypes to expected genotypes using something like Hardy-Weinberg), but that testing fit to a proportional model, using something like days of the week, would be OK since the latter is not dependent on the observed data...? – coreydevinanderson Feb 08 '18 at 15:09
  • Yes, all of that sounds correct. – whuber Feb 08 '18 at 15:14

0 Answers0