Please pay attention, I am interested in Bernoulli samples, and hope to find criteria specific to Bernoulli distribution, not using s Student's t-statistics or Mann-Whitney or etc., since their use relies on asymptotic assumptions, which I would prefer to get rid off. (Because for large sample sizes it is Okay to use asymptotic normality, but not for small samples sizes, which might occur in my situation).
Question: Consider three Bernoulli samples (0,1,..1), (1,1,..0), (0,0..,1). of possibly different length. I want to understand are some of them from the same distribution or non of them ?
I would expect some kind of "p-value" calculation as an answer to the question. My problem is that I do not see how to get it for Bernoulli distibution without relying on some asymptotic normality.
Motivation
The question is typical for machine learning binary classification tasks, at the stage of feature preprocessing, where one the ways to preprocess
categorical features -- is to merge groups with small number of observations to larger groups, thus getting more stable estimators.
PS
Differnce with the already asked question:
There is nice discussion on related question: Principled way of collapsing categorical variables with many levels?
However the difference is that I am interested at a specific situation of the Bernoulli sample (or in machine learning language binary classification problem), while the later question/answers deal with the general situation of generic/unknown distribution of target variable. Why cannot hope in general to have some analytical answer, while for specific case of Bernoulli disribution it sounds like a classical question which, most probably, have been addressed in literature.