2

I want to determine the independence between groups of logical variables. For example, for three groups, I want to know if the number of females (versus males) differs between the groups. I'm applying the chi square test for this. What is the difference between using a 2-row contingency table versus a 1-row contingency table with population probabilities?

For example, using the Agresti example in the chisq.test documentation for R:

## From Agresti(2007) p.39
M <- as.table(rbind(c(762, 327, 468), c(484,239,477)))
dimnames(M) <- list(gender=c("M","F"),
                    party=c("Democrat","Independent", "Republican"))
(Xsq <- chisq.test(M))  # Prints test summary
Xsq$expected   # expected counts under the null

I get:

X-squared = 30.0701, df = 2, p-value = 2.954e-07

      party
gender Democrat Independent Republican
     M 703.6714    319.6453   533.6834
     F 542.3286    246.3547   411.3166

If I use the same data to create logical vectors:

testChiSqPVal <- list(DemocratIsMale=c(rep(TRUE,762),rep(FALSE,484)),
                  IndependentIsMale=c(rep(TRUE,327),rep(FALSE,239)),
                  RepublicanIsMale=c(rep(TRUE,468),rep(FALSE,477)))

and then create a one-row contingency table, and calculate chi square setting the probability vector to match the group percentages:

library(plyr)
tbl <- laply(testChiSqPVal, sum)
Xsq <- chisq.test(tbl, p=laply(testChiSqPVal,length), rescale.p=TRUE)
Xsq
Xsq$expected

I get the same expected values as in the first row above, but a different chi square sum:

X-squared = 13.0882, df = 2, p-value = 0.001439

[1] 703.6714 319.6453 533.6834

Clearly I'm testing two different hypotheses, but I'm not sure exactly what they are.

kjetil b halvorsen
  • 63,378
  • 26
  • 142
  • 467
patrickmdnet
  • 359
  • 1
  • 2
  • 10

0 Answers0