Difference between two (dependent) multinomial random variables

Question

This multi-part question is motivated by the often used customer satisfaction measure "Net Promoter Score". I understand there is already a much liked Q&A for the margin of error for NPS however, this uses a normal approximation (please correct if wrong) so I'm looking for a more general treatment of NPS and hopefully an "exact" confidence interval. I'm not afraid of a bit of theory (except maybe pivots), but please explain like I am a slow statistics undergrad. Enough premable!

Suppose we have the multinomial distribution: $$(A,B,C)\sim Multinomial(n,\pi_A,\pi_B,\pi_C)$$

What is the distribution of the difference, $\Delta=A-C$ ? (Is it a standard univariate distribution?)
How can I find a $(1-\alpha$) confidence interval for $\pi_A-\pi_C$ ?

There must be binomial distribution involved. Counts for each of A, B, and C are approximately normally distributed. This means the difference between means is also normally distributed. The same applies to proportions. — Alexey Burnakov, Oct 15 '17 at 09:53
https://en.wikipedia.org/wiki/Binomial_distribution on the right pane there are distribution moments, including a variance or standard deviation which is used to get a standard error (which is used to set a confidence interval). I belive you have everything you need. — Alexey Burnakov, Oct 15 '17 at 10:05
I think question 3 is not well posed. $\Delta_1$ and $\Delta_2=\Delta_1/n$ are random variables. What do you mean by a confidence interval of a random variable? I suppose what you really want is a confidence interval for the true parameter $\pi_A-\pi_C$? — StijnDeVuyst, Oct 15 '17 at 10:25
@StijnDeVuyst thanks! Yes, that is what I was going for I'll edit the question. — Liam Smith, Oct 15 '17 at 10:27
@AlexeyBurnakov sorry Alexey, I don't understand. If I just treat each random variable separately using their marginal variances, doesn't that fail to account for the dependence between the two variables? Are you just suggesting a conventional difference in proportions from two independent samples approach? — Liam Smith, Oct 15 '17 at 10:40
I have understood your remark. I think you don't really have this obstacle here. Sample independence assumption holds. Imagine the inverse situation: if you draw balls from an urn, with each draw you change the distribution of the left ball colors. Or, alternatively, if you work on a sample for sociological study, if you decide to draw a sample without repetition, you limit the number of people left in the population and again you change their distribution. Here, given that you constructed a representative sample of people to answer your NP score question, you did not do anything with true pop — Alexey Burnakov, Oct 15 '17 at 10:55
The fact your A part of sample is about what counts are left for B and C parts are distributed does not deal with dependent variable. So I think doing marginal proportion testing is fine. — Alexey Burnakov, Oct 15 '17 at 11:08
Giving more sense to it: If you had a limited number of A, B, and C choices, then you would really had the occurence of A dependent on the occurence of either B or C. You don't do you? — Alexey Burnakov, Oct 15 '17 at 11:13

score 1 · Accepted Answer · answered Oct 15 '17 at 12:52

There is an easy solution to this in principle, but maybe one that is not very handy in practice. First, observe that $A-C$ can vary between $-n$ and $n$ so let us look at $n+\Delta_1 = n+A-C$ instead to have a nonnegative discrete random variable. Let's say its mass function is $$ p(i) = \text{Prob}[n+A-C = i] $$

Now, we can see $A$, $B$, $C$ as the result of $n$ independent throws of a three-sided die with probabilities $\pi_A$, $\pi_B$ and $\pi_B$. The result of one throw ($n=1$) has probability generating function (pgf) $$ \text{E}[x^Ay^Bz^C|n=1] = \pi_A x + \pi_B y + \pi_C z\,, $$ so that because of the independence between the throws the joint pgf of $A$, $B$, $C$ is $$ F(x,y,z) = \text{E}[x^Ay^Bz^C] = (\pi_A x + \pi_B y + \pi_C z)^n\,. $$ Since $A+B+C=n$ we know that $n+A-C=2A+B$ with pgf $$ z^n\text{E}[z^{A-C}] = \text{E}[z^{2A+B}] = \text{E}[(z^2)^A z^B 1^C] = F(z^2,z,1) = (\pi_A z^2 + \pi_B z + \pi_C)^n\,. $$ In principle, this gives you the mass function $p(i)$ you need since $$ \text{E}[z^{n+A-C}] = \sum_{i=0}^{2n} p(i) z^i\,, $$ and $p(i)$ is identified as the coefficient in the series expansion in $z$ of the above pgf. I am sure @glen_b could tell you exactly what the numerical complexity is of doing something like that.

In general, $p(i)$ is not any of the 'standard' univariate distributions but you have some things going for you if $n$ is either very small or very large. If $n$ is small, you could expand the pgf 'by hand' so to say. If $n$ is large, then the correlation between $A$ and $C$ vanishes and their distribution becomes normal due to CLT so you can assume that $A-C$ is normal as well with $\text{Var}[A-C]=\text{Var}[A]+\text{Var}[C]$, as was suggested in the comments above. The simple estimator $(A-C)/n$ for $\pi_A-\pi_C$ will be very adequate then.

thanks for the answer, the details are a bit beyond me unfortunately - I think I should have asked for an 'explain like I am 5' version :) The take home message that $(A-C)/n$ is a reasonable large sample approximation for $\pi_A - \pi_C$ is fine, its just more fuel for the normal approximation - the CLT is just very handy it seems ¯\\_(ツ)_/¯ — Liam Smith, Oct 16 '17 at 06:26
@Liam Smith I am late at this ) However I have also wanted to say that your question would really be a generalization of a dice roll, or even simpler, a coin toss. In both cases realization of a random variable is not in any way affected by any other tosses/rolls. In fact a three-level multinomial problem is not that different from a binomial (two-level multinomial), and as you can clearly see you don't assume that an even coin will produce a dependent sample for you. — Alexey Burnakov, Oct 16 '17 at 10:40
@AlexeyBurnakov thanks, I think I appreciate your point a bit more now — Liam Smith, Oct 16 '17 at 23:54
You initially asked about whether the problem involves dependency in a multinomial setting. @StijnDeVuyst gave an exhaustive answer I think, while I tried to workout an intuition for this. ) You are welcome. — Alexey Burnakov, Oct 17 '17 at 12:23

Difference between two (dependent) multinomial random variables

1 Answers1