1

I have to use the R "survey" package and want to be sure to understand it.

I simulated data where the sample comes from a simple random sampling process. When I compare the result of the Pearson's chi-square test (chisq.test), and the results using svychisq, the p values slightly differ (though really very close), while I expected them to be the same. Am I missing something?

Besides, I want to compare some characteristics between two groups, from data coming from a sampling, with different sampling probabilities but the same probability inside each group (a control group where the sampling probability is 1/1000 for all controls, and an "exhaustive" case group : probability=1 for all cases). In that case, is there a real impact on the test, that justifies using the Rao-Scott chi-square test?

thogs
  • 21
  • 5
  • Interesting. I just posted this: https://stats.stackexchange.com/questions/464720/stratified-survey-calculations-by-hand-and-with-survey-package-dont-agree-simu – abalter May 05 '20 at 21:22

1 Answers1

2

You would expect slight differences. The Rao-Scott tests were developed (according to Alastair Scott) to give approximately correct inference for estimated population tables, so the code estimates the population table, calls chisq.test() on it, and rescales the statistic. Even for simple random sampling, this will introduce slight variations.

When comparing proportions in the two strata under case-control sampling there is no need to use the Rao-Scott test. The test is for the 2-way association parameter in a loglinear model, and this association parameter is preserved by case-control sampling (for the same reason that logistic regression coefficients are).

[I don't think this has anything to do with the linked Stratified survey calculations by hand and with survey package don't agree. Simulation results

Thomas Lumley
  • 21,784
  • 1
  • 22
  • 73