Statistics of a function of a binomial variable

Question

I'm struggling with a statistics problem. Sadly, I'm not a statistician. Maybe someone here who knows more than me (a low bar) can offer some pointers.

I have a population that gets divided, randomly, into two pieces, $X$ and $Y$. Now if I just wanted to check if the division is reasonably due to chance, this is easy: I assume a binomial distribution, I'm just counting who ends up in $X$ or $Y$. I compute $ n=|X|+|Y|, σ=\sqrt(np(1-p))$ (and I assume my $p=.5$), and then I compare to the normal distribution. So, for example, if I observed $|X|=45$ and $|Y|=55$, I'd say $σ=5$ and so I expect to have this variation from the mean $μ=50$ by chance 68.27% of the time. Alternately, I expect greater deviation from the mean 31.73% of the time.

But it's not so simple:

I actually want to measure some property of members of $X$ and $Y$. Let's say 25% in $X$ measure positive and 66% in $Y$ measure positive. ($X$ and $Y$ aren't the same cardinality -- the selection process isn't necessarily uniform.) I would like to know if I expect this difference by chance.

To make it slightly concrete without going into too much business specifics, think of a restaurant that is testing their menu design. When people walk in the door, they are invited to look at one of two menus (assigned randomly, $p=0.5$). They can choose to stay or they can go away. Now I measure how many people order then boeuf bourguignon. In other words, I'm testing menu design to see how it influences wanting BB. My question, when 45 people from one menu order BB and 55 people who saw the other order it, is how often this happens by chance.

I don't think this is the same as a simple binomial distribution, but I'm not sure.

This problem is important to me, but I actually have one more that is more subtle. I still have a process that divides people randomly into two populations (the two menus). But now, instead of just measuring consumption of boeuf bourguignon, I also measure how many people order the dauphinois potatoes and the chocolate mousse. Let's call those numbers A, B, and C in each group. I compute the statistic $t = (A-C)/(A+B+C)$. And I have the same question: what are the chances that the differences in this $t$ statistic is due to chance.

I think I'm confused by a few things in this explanation, but this sounds a lot like AB testing. Have you tried searching that phrase? — Taylor, Dec 11 '18 at 18:30
Hi, @Taylor . Yes, this is A/B testing. But I'm afraid that knowing that doesn't quite lead to answering the question. — jma, Dec 11 '18 at 20:10
If $X$ and $Y$ are binomial, why are you taking the absolute value of both of them? You might get better search results if you search for "two sample hypothesis test" or "hypothesis test for difference in means" — Taylor, Dec 11 '18 at 20:24
@Taylor When I say $|X|$ when referring to a set, I mean the cardinality of the set. — jma, Dec 12 '18 at 09:07

score 0 · Answer 1 · answered Mar 20 '21 at 07:41

Since which menu you presented to which user was decided by randomization, the total number of each menu presented $n_1, n_2$ is not in itself informative on your hypothesis, so you can just condition on them (they are ancillary statistics). In practice that is the same as treating them as if fixed in advance. Then you just have a two-sample hypothesis test problem.

In your first example, then,you have a comparison of two binomial counts, see Exact two sample proportions binomial test in R (and some strange p-values)

In your second example, if sample sizes are large enough you could use a normal approximation, if not, maybe a permutation test.

Statistics of a function of a binomial variable

1 Answers1