0

I am going to use Chi-squre test to see if two categorical variable are independent or not and I just realized the number of observation can affect the result of test:

Data set 1:

    Group 1      Group 2
A    10              20
B    30              40

Date Set2:

    Group 1      Group 2
A    100              200
B    300              400

The result of test for the second data set is 10 times bigger than the first data set and it affect the result rejecting/accepting null hypothesis. As the ration of groups in both data set are the same it does not make sens to me to get two different result...Could you please help me to understand what sort of data I nee to use for chi test?

Thanks, Amir

Amir
  • 101
  • 1

2 Answers2

0

As the ration of groups in both data set are the same it does not make sens to me to get two different result

Consider what a $p$-value is supposed to do: to quantify evidence against the null hypothesis, which in this case states that the categories are independent. Under the null hypothesis, you're much less likely to find a joint empirical distribution this far from independence when the sample size is 1,000 than when it's 100. Sure enough, you get a lower $p$-value with the larger sample. This is an example of the rule of thumb that as a sample size increases while the sample effect size stays fixed, the $p$-value decreases. Any good significance test is more powerful with larger samples.

Kodiologist
  • 19,063
  • 2
  • 36
  • 68
0

This is also a good example to see the difference between p-value and effect size. As you noted, the p-values for the two tables you present are quite different, but if you calculate a measure of association, such as phi, it will be the same for the two tables.

The examples in R:

if(!require(psych)){install.packages("psych")}

Input =("
Letter  Group.1  Group.2
A       10       20
B       30       40
")

Matrix1 = as.matrix(read.table(textConnection(Input),
                    header=TRUE,
                    row.names=1))

Matrix1

chisq.test(Matrix1)

library(psych)

phi(Matrix1, digits=4)


Input =("
Letter  Group.1  Group.2
A       100       200
B       300       400
")

Matrix2 = as.matrix(read.table(textConnection(Input),
                    header=TRUE,
                    row.names=1))

Matrix2

chisq.test(Matrix2)

library(psych)

phi(Matrix2, digits=4) 
Sal Mangiafico
  • 7,128
  • 2
  • 10
  • 24
  • Thank you. Phi seems like a good test for this case but I seems it works for binary variables. Is there any test for categorical variables with non-binary values? – Amir Aug 21 '17 at 02:10
  • Cramer's *v* is an analogous statistic for 2-dimensional tables larger than 2 x 2. You might use the `assocstats` function in the `vcd` package. But understand that *phi* and Cramer's *v* are not tests. They are measures of association. – Sal Mangiafico Aug 21 '17 at 02:29