What is the distribution of the difference of two independent multinomial random variables?

Question

Say I have two independent random vectors $X_c$ and $X_f$. The random vector $X_c$ is composed by three random variables: $X_{1c}$, $X_{2c}$ and $X_{3c}$. The second random vector $X_f$ is composed by $X_{1f}$, $X_{2f}$ and $X_{3f}$:

\begin{equation*} \mathbf{X_c} = \left( \begin{array}{c} X_{1c}\\ X_{2c}\\ X_{3c} \end{array} \right) \qquad \mathbf{X_f} = \left( \begin{array}{c} X_{1f}\\ X_{2f}\\ X_{3f} \end{array} \right) \end{equation*}

Let's also consider that $X_c\sim Multinomial(n,p_{1c},p_{2c}, p_{3c})$ and $X_v \sim Multinomial(n,p_{1f},p_{2f}, p_{3f})$.

My objective is to obtain the distribution of $Y$, defined as $Y = X_c - X_v$.

Can someone suggest me how can I proceed to obtain the distribution of $Y$? Thank you in advance.

score 3 · Answer 1 · answered Sep 02 '21 at 21:55

Unfortunately, this is one of those cases where you can get an expression for the probability mass function, but it is quite a complicated expression that does not simplify. To see this, let's consider the more general case where you have independent random vectors $\mathbf{X}_1 \sim \text{Mu}(n, \mathbf{P}_1)$ and $\mathbf{X}_2 \sim \text{Mu}(n, \mathbf{P}_2)$ and you set $\mathbf{Y} \equiv \mathbf{X}_1-\mathbf{X}_2$. Then for any possible outcomes $\mathbf{y}$ we get the probability mass function:

$$\begin{align} p_\mathbf{Y}(\mathbf{y}) &\equiv \mathbb{P}(\mathbf{Y}=\mathbf{y}) \\[12pt] &= \sum_{\mathbf{x}} \text{Mu}(\mathbf{x}+\mathbf{y}|n, \mathbf{p}_1) \cdot \text{Mu}(\mathbf{x}|n, \mathbf{p}_2) \\[6pt] &= \sum_{\mathbf{x} \in \mathscr{X}(\mathbf{y})} {n \choose \mathbf{x}+\mathbf{y}} {n \choose \mathbf{x}} \prod_{i=1}^k p_{1,i}^{x_i+y_i} p_{2,i}^{x_i}. \\[6pt] \end{align}$$

where $\mathscr{X}(\mathbf{y}) \equiv \{ \mathbf{x} \in \{ 0,...,n \}^k | \min (x_i+y_i) \geqslant 0, \sum x_i = n \}$ is the set of all possible values of $\mathbf{x}$ consistent with the outcome $\mathbf{y}$. Now, this is technically a closed-form expression for the mass function, since it is a finite sum of closed form expressions. Unfortunately, it does not simplify any further. With a bit of effort you can program this function in computational software so that you automate the computation, and that is about as good as you can do here.

Of course, if you don't mind an approximation, another approach that is useful for large $n$ is to approximate the multinomial distribution by the multivariate normal distribution, so that the difference vector also have a multivariate normal distribution. You could get a reasonably approximating expression that uses the multivariate normal and then rounds the values to integers and then imposes the support constraints. That would be another way to deal with it.

Maybe this is a case where one can get a good approximation via the saddlepoint approximation? — kjetil b halvorsen, Sep 03 '21 at 01:38

What is the distribution of the difference of two independent multinomial random variables?

1 Answers1