2

Basically I have two (or more) different success probability generating distributions. In other words, there are two (or more) different (non-normal) distributions, which realize success probabilities for two groups of binomial distributions.

Empirically I have the number of successes and attempts for each binomial distribution, and also which binomial distributions belongs to which group. Of course, if I know the true success probabilities for all binomial distributions, I can simply run K-W test or something similar, and have a significance testing on the two distributions. However, since the data I have is the realization of the binomial distributions, I can only estimate the success probabilities for each.

Is there any elegant way to have a significance testing on these two non-normal distributions?

kjetil b halvorsen
  • 63,378
  • 26
  • 142
  • 467
user217357
  • 21
  • 1
  • 1
    This question isn't totally clear to me, so to clarify, I'll try to restate: You have $n_i$ and $x_i$ (# of trials, successes) for each of several binomial distributions, which differ in their probability parameters $p_i$. The distributions are in several known groups $j=1,\dots, J$; each group has its $p_i$ drawn from a group-specific, unknown, probability distribution $f_j$. You are attempting to test whether the $f_j$ are different, given knowledge of the $(n_i, x_i)$ pairs and the group memberships. Does this seem right? – jbowman Sep 04 '18 at 19:40
  • 1
    Yes, you are totally correct. I was thinking about introducing some general, formal notations here and you already did it just right. Thanks! – user217357 Sep 04 '18 at 20:32

1 Answers1

0

Reformulation based on the comment by @jbowman: There are $J$ groups indexed by $j=1, \dotsc,J$. For each group we have some independent binomial observations, $x_{ji} \sim \mathcal{Binom}(n_{ji},p_{ji})$ conditional on the probability parameter $p_{ji}$, which are in turn sampled (independently) from some distribution on the interval $(0,1)$, $f(p; \theta_j)$. The binomial probabilities $p_{ji}$ are not observed, so they can be seen as a latent variable. The groups then differ in the distribution of binomial probabilities within the group. It is not clear to me if the $n_{ji}$ are given constants or also somehow sampled. Neither is it clear how $f$ is to be modeled, parametrically or nonparametrically.

But I would start out with some parametric model for $f$, just to start with a somewhat tractable problem. So, for a start, let us assume that $f(p; \theta_j)$ is a beta distribution. That will give a point of comparison for a more general solution, at least. Let us also, for a start, assume that the $n_{ji}=n_j$ are constants within the groups. First, observe that if $n_j=1$ we are lost! since then the resulting beta-binomial distribution collapses to a Bernoulli distribution, and the likelihood function will be a constant on lines in the $(\alpha,\beta)$ parameter space where $\frac\alpha{\alpha+\beta}$ is constant. So assume that $n_j\ge 2$.

Since all the $J$ groups can be treated equally, just focus on one of them, and drop the index $_j$. With $k$ observation in the group, the situation now reduces to $k$ iid observation from the distribution $\mathcal{beta-Binom}(n,\alpha,\beta)$, and can be solved with maximum likelihood. Even if the binomial $n$ varies within the group, the same solution applies.


So, what can we do in a more general case, when the distribution generating the binomial probabilities is more general, $f(p_i ; \theta)$? What we did above was integrating out the $p$ parameter with respect to its distribution $f$. That could be solved explicitly, because the beta distribution is the conjugate distribution of the binomial. But the same principle should apply, even if the integration cannot be done explicitly. A general discussion is here: MLE: Marginal vs Full Likelihood

In this case the integral is one-dimensional over $(0,1)$, so for a parametric family could be done numerically. But I suspect that the intended case is nonparametric, so what to do then?

(I post this for now, and will try to come back for that last Q)

kjetil b halvorsen
  • 63,378
  • 26
  • 142
  • 467