How do I test my hypothesis that the populations are the same in two trials with binomial/Bernoulli results?
For testing the hypothesis that the populations are different, I would use straightforward statistical hypothesis testing using a Chi-squared or Fisher's exact test (e.g. using chisq.test() or prop.test() in R, or the equivalent calculation in Excel), as suggested in these answers:
However, if I have understood right, these test the hypothesis that the populations are different, and failing to reject the null hypothesis that they are the same is not what I am trying to show. My hypothesis (based on a priori information) is that they are the same, and to prove that with my data I need to fail to reject the null hypothesis that they are different. So I need to test the hypothesis that the populations are the same.
I have found a question that helps for this issue, which says that this is testing for equivalence, which is indeed different to standard hypothesis testing:
However, from the linked suggestions there, I could only find answers that address normally-distributed data (i.e. numerical data that has a mean and a standard deviation), and my data is categorical (yes/no, success/failure, Bernoulli trials, binomial distribution).
Further, the top answer there says that
Essentially you need to decide how large a difference is acceptable for you to still conclude that the two groups are effectively equivalent
Which seems much more subjective than standard hypothesis testing, and seems to beg the question I'm trying to answer. I want to use the data to show that the groups are effectively equivalent, not pull a figure out of thin air for how large a difference is acceptable and see whether my data fall within that difference. (I can see how this approach might make sense in a clinical/medical regulatory context, though.)
Is there no simple procedure analogous to the Chi-squared test to test the hypothesis that two groups of Bernoulli-trials are essentially the same?
(If it is relevant, in my data N is quite large - of the order of thousands of individuals in each of two trials. I'm looking at several outcomes within each trial, and for some of those outcomes, about half of the trials are successes, but for others, the figures are near to 100% or 0%, so the number of individuals in some of the cells of a contingency table may be quite low or even zero. Inspecting the percentages by eye, they sure look the same - most differ by less than a percentage point or three - but that's not a proper statistical test.)