For a hypothesis $H_0\equiv p_1=p_2$ (for two Bernoulli distributions), we take a measurement and find that $X_1=i,X_2=j$, and we denote $k=i+j$
Now, to test $H_0$, the Fisher-Irwin test tells us to compute $$P_{H_0}(X_1=i|X_1+X_2=k)=\frac{\tbinom{n_1}{i}\tbinom{n_2}{k-i}}{\tbinom{n_1+n_2}{k}}$$
As I understand this, this probability is supposed to be "the probability that we would get the measured data $i,k$ in a measurement assuming that $H_0$ is true". We then compare this probability with the significance level, as in any other hypothesis testing exercise -- we always calculate the probability that we would obtain the measured dataset, and then compare with the significance level.
However, the given expression is not the probability that we would get the data $i,k$ in a measurement assuming that $H_0$ is true". That probability is $$P_{H_0}(X_1=i \:\cap X_1+X_2=k )=P_{H_0}(X_1=i|X_1+X_2=k)\times P_{H_0}(X_1+X_2=k)$$
The probability distribution suggested by the Fisher-Irwin test seems to miss out the fact that there are two independent variables and not one ($X_1$ and $X=X_1+X_2$). The expression gives me the impression that it is presupposing $X=k$ and then going on to deal with the probability for $X_1$.
Why do we use $P_{H_0}(X_1=i|X_1+X_2=k)$ in the Fisher-Irwin test and not $P_{H_0}(X_1=i \:\cup X_1+X_2=k)$? The explanation1 2 given in Sheldon Ross' book isn't quite convincing.