I want to determine the independence between groups of logical variables. For example, for three groups, I want to know if the number of females (versus males) differs between the groups. I'm applying the chi square test for this. What is the difference between using a 2-row contingency table versus a 1-row contingency table with population probabilities?
For example, using the Agresti example in the chisq.test documentation for R:
## From Agresti(2007) p.39
M <- as.table(rbind(c(762, 327, 468), c(484,239,477)))
dimnames(M) <- list(gender=c("M","F"),
party=c("Democrat","Independent", "Republican"))
(Xsq <- chisq.test(M)) # Prints test summary
Xsq$expected # expected counts under the null
I get:
X-squared = 30.0701, df = 2, p-value = 2.954e-07
party
gender Democrat Independent Republican
M 703.6714 319.6453 533.6834
F 542.3286 246.3547 411.3166
If I use the same data to create logical vectors:
testChiSqPVal <- list(DemocratIsMale=c(rep(TRUE,762),rep(FALSE,484)),
IndependentIsMale=c(rep(TRUE,327),rep(FALSE,239)),
RepublicanIsMale=c(rep(TRUE,468),rep(FALSE,477)))
and then create a one-row contingency table, and calculate chi square setting the probability vector to match the group percentages:
library(plyr)
tbl <- laply(testChiSqPVal, sum)
Xsq <- chisq.test(tbl, p=laply(testChiSqPVal,length), rescale.p=TRUE)
Xsq
Xsq$expected
I get the same expected values as in the first row above, but a different chi square sum:
X-squared = 13.0882, df = 2, p-value = 0.001439
[1] 703.6714 319.6453 533.6834
Clearly I'm testing two different hypotheses, but I'm not sure exactly what they are.