We have $N$ buckets in which we will put some balls. Before that, the buckets are split into two groups, group $A$ and group $B$. The number of balls that we will put in each bucket is drawn from a Binomial distribution. In group $A$, the parameters of this Binomial distribution are $n$ and $p_A$, while in group $B$, the parameters of the Binomial distribution are $n$ and $p_B$.
Given $p_A$, $p_B$ and $n$, what strength of association (number of balls ~ groups) should one expect to find? Given $N$, what is the 95% confidence interval?
If we can't get a solution analytically, I would welcome a piece of code that can do some numerical estimations (I started below with a tiny and very slow R code). Numerical estimations have the advantage that they will provide the whole distribution while it will probably be very complicated to provide to calculate the whole distribution analytically.
Numerical estimations with R
Here is a quick R code to plot the distribution of the coefficient of correlation for chosen values of $n$, $p_A$, $p_B$ and $N$
# Settings
N = 200
pA = 10^(-6)
pB = 10^(-5)
n = 10^5
nbreplicates = 1000
# Simulations
groups = rep(c("A", "B"), N/2)
r.squares = c()
for (replicate in 1:nbreplicates){
buckets = c()
for (i in 1:N){
if (i%%2 != 0){
buckets = append(buckets, rbinom(1,n, pA))
} else{
buckets = append(buckets, rbinom(1,n, pB))
}
}
r.squares = append(r.squares, summary(lm(buckets ~ groups))$r.squared)
}
hist(r.squares)