You have, roughly speaking, the following data.
x1 = 1680000; n1 = 12000000
x2 = 3; n2 = 30
You might use 'prop.test` in R to compare the two
binomial proportions, but this test essentially uses a normal approximation. However, with only 3 members of your 'certain group' in the second sample, a normal approximation may not be accuract; hence the error message.
prop.test(c(x1,x2), c(n1,n2), cor=F)
2-sample test
for equality of proportions
without continuity correction
data: c(x1, x2) out of c(n1, n2)
X-squared = 0.39867, df = 1, p-value = 0.5278
alternative hypothesis: two.sided
95 percent confidence interval:
-0.06735183 0.14735183
sample estimates:
prop 1 prop 2
0.14 0.10
Warning message:
In prop.test(c(x1, x2), c(n1, n2), cor = F):
Chi-squared approximation may be incorrect
A chi-squared test on the appropriate $2 \times 2$ table, does essentially the same test. And we get an error message again because of the one small count.
TBL = rbind(c(x1,x2), c(n1-x1, n2-x2))
TBL
[,1] [,2]
[1,] 1680000 3
[2,] 10320000 27
prop.test(c(x1,x2), c(n1,n2), cor=F)
2-sample test for equality of proportions
without continuity correction
data: c(x1, x2) out of c(n1, n2)
X-squared = 0.39867, df = 1, p-value = 0.5278
alternative hypothesis: two.sided
95 percent confidence interval:
-0.06735183 0.14735183
sample estimates:
prop 1 prop 2
0.14 0.10
Warning message:
In prop.test(c(x1, x2), c(n1, n2), cor = F):
Chi-squared approximation may be incorrect
However, the implementation of chisq.test
in R, allow for simulation of a more accurate P-value, showing no significant difference between the two groups.
chisq.test(TBL, sim=T)
Pearson's Chi-squared test
with simulated p-value
(based on 2000 replicates)
data: TBL
X-squared = 0.14486, df = NA, p-value = 0.7861
By simulation, we have 'cured' the technical difficulty, but simulation does not creat new information. The reason we find no significant
difference is that we don't have enough observations in the second group to make a valid
comparison.
Another approach would be to regard the proportion $0.14$ from the large group as very
nearly the true population proportion.
A Jeffreys 95% confidence interval for the true population proportion based on $200\,000$ observations is $(0.1398, 0.1402).$
qbeta(c(.025,.975), x1+.5, n1-x1+.5)
[1] 0.1398038 0.1401964
So we will not be far wrong to compare the proportion $x_2/n_2 = 3/30 = 0.10$ from the first
group with $p_1 = 0.14$ from the first group, using an exact binomial test (with no normal approximation). This test shows no significant
difference. It's matching 95% confidence interval
$(0.021, 0.265)$ also contains the Group 1 proportion $0.14,$ so it is clear that we don't have enough data to say the two groups differ
in this respect.
binom.test(3, 30, .14)
Exact binomial test
data: 3 and 30
number of successes = 3, number of trials = 30, p-value = 0.7914
alternative hypothesis: true probability of success is not equal to 0.14
95 percent confidence interval:
0.02111714 0.26528845
sample estimates:
probability of success
0.1